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DISTRIBUTION THEORY OF TWO ESTIMATES FOR STANDARD 
DEVIATION BASED ON SECOND VARIATE DIFFERENCES* 


By A. R. KAMAT 
University College, London 


1, INTRODUCTION 
1-1. Preliminary remarks 
Anderson (1927) proposed the use of variate differences of various orders to eliminate the 
effects of a polynomial trend. Tintner (1940) has given a systematic account of the same in 
his book. Von Neumann, Kent, Bellinson & Hart (1941) advocated the use of the mean- 


n—-1 


square successive difference 6? = > (x;—2;,,)?/(n— 1), that is, an estimate based on the first 
i=1 


variate difference, to estimate o* when the mean of the parent population is undergoing 
a slow-moving continuous trend. Recently the present author (19535) has discussed the 


approximate distribution of another estimate based on the first variate difference, viz. 
n—1 

the mean successive difference d = ¥ |x;—2,,,|/(n—1) which is useful under the same 
i=1 


circumstances. In this paper we deal with the approximate distribution of the following two 
estimates based on the second variate difference: 


(1) the mean square successive second difference: 
1 2-2 1 3 
03 = —5 DX (®j— 2% 41+ 242)? = jue X(A?x;)*; 
(2) the mean successive second difference: 


n— 


d, = 





3 1 
n—3 | ey — 2a 43 +242 | = re te | A*x; |. 
2 2 
If the variables x; are distributed normally and independently with a common mean and 
variance, a”, then the variate difference estimators of the variance are distinctly less efficient 


than the usual estimator s? = > (x;—%)?/(n—1). In fact, it has been shown (Morse & 


t 
Grubbs, 1947; Guest, 1951; Kamat, 1953c) that the asymptotic efficiencies are 
: 66-7 %, 63: 514%, d: 605%, dy: 47:1%,. 

Variate difference methods may be valuable, however, in cases where it is reasonable to 
suppose that independent normal deviations with distribution N(0, 7) are superposed upon 
a trend, and we wish either to estimate the value of o or, perhaps, to test whether it has 
changed from one series of observations to another. In such cases we suppose that x; = 4; + 2;, 
where z; is N(0, 7) and yu, is a slow-moving trend function. If the trend function, “;, may be 
‘locally’ represented by a straight line or a parabola with moderate curvature the first- or 
second-order variate difference estimators will be free of the heavy bias that affects the 
estimator s*. 

Different aspects of this subject have been discussed by a number of authors (Morse & 
Grubbs, 1947; Keen & Page, 1953). As an illustration of the extent to which the second 
variate differences succeed in eliminating trend, we may quote three examples given by 


* This paper is part of a thesis approved for the Ph.D. degree of the University of London. 
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2 Distribution theory of two estimates for standard deviation 


Tintner. The figures quoted below are values of the variances estimated from the squares 
of variate differences of various orders for: (1) annual wheat-flour prices (Tintner, 1952, 
p. 313); (2) annual wool prices (Tintner, 1940, p. 70); (3) annual raw-silk prices (Tintner, 
1940, p. 71). All data are for U.S.A. for the 48 years 1890-1937: 








P es See Ex. (1) Ex. (2) Ex. (3) 
0 4-7969 0-1069 2-8914 
1 07020 0-0277 0-3849 
2 0-4402 0-0259 0-2714 
3 0 3931 0-0262 02317 
4 0-3767 0-0264 0-2033 
5 0-3662 0-0263 01824 




















Tintner has shown that the estimated variance may be regarded as stabilized at p = 1 
or 2 for Ex. (1), at p = 1 or 2 for Ex. (2) and at p = 2 for Ex. (3). 

As stated above, the object of the present paper is primarily to derive the higher moments 
of the sampling distributions of 63 and d, and indicate their approximate distributions. In 
a further paper it is hoped to discuss various practical uses of distributions based on these 
moments and other allied problems. 


1-2. Notation 


Let x; (i = 1,2,...,%) denote a sequence of observations from a normal population with 
a constant mean and s.p. 7. Then the second variate difference is defined as 


2 -~— a 
APE, = Xj — 244+ X49, 


and the two estimates discussed below are: 


n—2 n—2 
63 = > (A*z,)?/(n—2) and d,= > | A*z,|/(n—2). (1) 
i=1 i=1 
For simplification we shall use the following notation. Let o’ = o(A*x,;) = ./6c, then we 
define z, = A*z,|/(/60) = A*z,/o’ (¢ = 1,2,...,n—2), (2) 
so that o(z;)=1 ) 
and Px = P2241) (A*x,, A®x,.,) = —§, | 


yi (3) 
Po = P(2, 242) = p(A?x,, A*x,,2) = 4, 


Pm = P(%s 24m) = P(A*X;, A?X 4m) = 0 (m>2).) 





2. THE DISTRIBUTION THEORY OF THE MEAN-SQUARE SUCCESSIVE SECOND DIFFERENCE, 63 
2-1. Preliminary results 
Using the z variables defined in (2) above we have 
@(X(A2x,)?) = {F(Xz3)} 02, 
E(X(A2x,)?)? = {F(Zz?)"} 04, a} 
In order to obtain the first four moments of 63 we have therefore to evaluate the following 


expectations which can be found from the moments of multivariate normal distributions 
up to the four-variate case by substituting the appropriate values of p,, (m = 1,2 and > 2): 


(4) 


we 


por 


- @Q 





é(22) = 1, 
&(z4) = 3, & (2222) = 32, E (2222) = 18, 
E (2) = 15, E(izz) = 3F, F& (423) = 3, 
Stele) = WE, Shel) = HS tedah) = 9, 
&(z8) = 105, E (222) = 55, E (2822) = 38, 
cla MB, SANE, Sidaed = 4 (5) 
Eas) =, F(Aege)=7t, F(Aazi) = YH. 
Sette) = 6, — Sietehay =, Sede) = 3B 
6 (24252524) = tos, F (2282428) = Tes, 
S(ehe$ehel) = 488, Seteheted) = $4, 
E(zh2328 27%) = 788, F(z42§z825) = 334. J 
2-2. Moments of 82 
Let n—2 = mand Ss he Z, then 
i=1 
&(Z) = mé(z4), } 


E(Z?) = m&(z4) + 2(m — 1) & (2222) + 2(m — 2) & (2222) 
+ (m— 2) (m— 3) &7(z2), 
E(Z°) = m&(z8) + 6(m — 1) &(zAz2) + 6(m — 2) & (2422) 
+ 3(m— 2) (m— 3) &(A) &(22) + 6(m — 2) &(2222.22) 
+ 12(m — 3) & (222222) + 6(m — 4) &(222222) 
+ 6(m — 3) (m — 4) & (2222) &(z2) + 6(m — 4) (m — 5) & (2222) &(z?) 
+ (m— 4) (m— 5) (m— 6) &3(z?2), 
E(Z4) = m&(z8) + 8(m — 1) &(z$z2) + 8(m — 2) & (2422) 


8( 
+ 4(m — 2) (m— 3) &(2) &(z2) + 6(m— 1) &(Az) 
+ 6(m — 2) &(zAz4) + 3(m — 2) (m— 3) &2(z) 
+ 24(m — 2) &(z4z222) + 12(m — 2) &(22 24.22) 
+ 24(m — 3) &(z42222) + 24(m — 3) &(z2 Az?) 
+ 24(m— wt a lace ee 4) &(Az2z2) | (6) 
+ 12(m — 4) &(z2z422) + 24(m — 3) (m — 4) &(Az2) &(22) 
+ 24(m— 4) (m-— 5) &(ch2) &(e8) + 12(m— 3) (m—4) 8 (e828) & (2) 
+ 12(m—4)(m capitis aes 4) (m— 5) (m— 6) &(A) &2(22) 
+ 24(m — 3) &(2222 22 22) + 24(m — 4) & (22222222) 
+ 48(m — 4) ¢ Pisi-tet z2) + 24(m — 5) &(z222z222) 
a 5) & (222222 22) + 24(m — 6) & (22222223) 
+ 24(m— 4) (m—5) & (222328) & (22) 
+ 48(m — 5) (m— 6) & (222222) &(z?2) 
+ 24(m — 6) (m — 7) &(z22222) &(z2) 
+ 12(m— 4) (m — 5) &?(2222) + 24(m — 5) (m —6) & (22.22) (2222) 
+ 12(m — 6) (m — 7) &(z222) 
+ 12(m— 5) (m—6) ( (m — 7) & (2228) &(22) 
+ 12(m —6) (m — 7) (m— 8) & (2222) &*(z?) 





+ (m— 6) (m— 7) (m— 8) (m—9) &4(z2). 











+ Distribution theory of two estimates for standard deviation 


Substituting the values of various terms in (6) from (5), after considerable simplifications, 
we have 


#4(Z) =m, : 
3(Z) = m? + 28m — 2, | 
Hy(Z) = m3 + 35m? + 284m — 280, (7) 
fy(Z) = m4 + 22m + £882m?2 + ena tthtead 





For 63 = Zo’®/m, therefore, we obtain the moments 


, 2 308m — 280 _,, 
oe te al ee a ae 
35m — 18 1 (8) 
om — , oc , 
be = om? -C 4. 4 = 27méA (1225m? + 11610m— 15664) Co | 


Since o’? = 60%, still writing m = n— 2, we have for the statistic 63/0, 


ui = 6, f, = 9(308m — 280)?/(35m — 18)?, } (9) 


My = 4(35m—18)/m2, f, = 3+3(12870m— 15988)/(35m — 18)°. 


It is to be noted that the formulae for 4, and £, hold for m > 4 (n > 6), and those for ~, and 
£, hold for m>6 (n>8). The characteristic equation of the matrix of the quadratic form 
2 (A*z;)*?, the derivation of which is considered in the last section, provides a check on the 
long algebra of the evaluation of the moments. 


2-3. The distribution of 8 
Except in the very exceptional case of n = 4 which is discussed below in § 2-4 it seems 
difficult to obtain the exact distribution of 63. Table 1 gives moment constants for 43 for 
n = 5,7, 10, 15, 20, 25, 30, 40, 50. The approach to normality is slow and the /,, £, points 
lie in the Pearson Type VI region, which suggests that a curve of this type may provide a 
good approximation to the distribution. 





Table 1. Standard deviation and f,, f, values for 63/0? 




















n a (63/0?) By | pe 
| 
5 6-2183 5-6684 11-9679 
7 5-0120 3-6922 8-8861 
10 4-0446 2-3870 6-8010 
15 3-216] 1-4956 5-3772 
20 2-7487 1-0880 4-7275 
25 2-4394 0-8548 43563 
30 2-2154 0-7038 41163 
40 1-9064 0-5201 3°8245 
50 1-6987 0-4124 _ 36536 
MT te ‘ 





Note. The mean of 83/0 is 6. 
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2-4. Exact distribution of 53 forn = 4 


For n = 4 the exact distribution of 63 can be obtained by a method similar to the one by 
which von Neumann e¢ al. (1941) have obtained the exact distribution of the mean- 
square successive difference 6? for n = 3. In this case 


62 
Pr & < a = Pr bass ((A®x)? + (A*ar,)?) < 05 


= Pr {22422 < 46%. (10) 
where a= sl (¢=1,2); o(z)=1, plzz) = —§. 
Now Pr {22 + 22 < 162} = ——_ exp { — (22 + 22 + 4z,2,)} dz, dz,. 





ee! 40? 


Normalizing the quadratic form, this integral can be written as 


ati 


3 8(2'2 4 12!) 3 (flv8 art (7 prt cint 
T exp { — 3(2;" + 329”) } dz, dz, = on 6 re ‘ e@r*sin®’ d@\dr. (11) 


ar J : 
ait+2,"< 408 
The substitution £r? = — 2iu, 6 = 47-20 transforms the integral 
2n 
| et? sin? d@ = 2m e~** J(u), 
0 
where J,(w) is the Bessel function of the zero order. Therefore, finally, 


8o/V3 
5e<8i] = 5.” ree subir, (12) 


Pr 
\o2 = 9% V5 J 0 


and the probability density function is given by 


p(83o2) = iver, (28). (13) 
2 2,/5 N5o 


3. THE DISTRIBUTION THEORY OF THE MEAN SUCCESSIVE SECOND DIFFERENCE d, 
3-1. Preliminary results 


Using the z-variables defined in § 1-2 above we have 
&(X| A®x;|) = {F(Z |z;|)}o’, ) 


a ‘ (14) 
&(E | A®x, |)? = {(Z|z,|)%}0'2, ete.J 


In order to evaluate the first four moments of d, we require the values of the following 
expectations in addition to some of the formulae given in (5) above. They are found by 
substituting the appropriate values of p,, in the absolute moments given in Kamat (1953a). 
The last six expectations, viz. those of the form &(| z,| | z;| | z,| ||), have been evaluated 
from the expansion for the absolute moment (1, 1, 1, 1) given in the same paper. The values 
of &(|z;||z;||z,||%|) given below are believed to be correct to the last figure in each 
case. 








where 


and 


Distribution theory of two estimates for standard deviation 


@(|z,|) = Jz E(| z,22|) = 


2 213 
alap=2/5, slaad= (25, eaap=,/? 


ae, lla \461) 
TT > 


Y( 
) 


E(| 212923 |) = 


2 
E(| 2 2,24|) = (- 











2 (us 20, 


os" 9 


18 108 

19 

Me) + iat Hbs—4,), 
194) 54g, 





3 +5), @(| 2,23 |) = 


2 





1 36’ 


), 


190, 
18 


= Hy/(35 


237 


73): 


+404); 


2\? / 
(lize) = (=) (“5 )+49,-s66,), 

2 (22/5 2 (73,/(3 
(\2t41) = 5 a7 + 2h); é(|stza)) => | a +40 
' 2 (53./5 17,/(35) 
6(\tey20|) = = (FA +844), E(| z,2225|) = = ( alae 

2 (13 ./(35) 253 
6(| ehzy24) => ( or +40), E(| 228% |) = 3, 
2 (37/5 (37 4/(35 
(yet) => (Gey + 44), (| 242525 |) = -(= 216 
219 
&(| z,22z,.|) = —= 
(| 1“3 5 | 718 
0, = sin-12 6, = sin}, 
2 
?, = sin-! 77 ¢, =sin—.,_ $, = sin- 
‘ “Nn 
2 
dy, = sin-! 57’ ds as sin-! J(35)’ vo) = sin-! 1 ’ 
/9\ 2 
E(| 222 2324 |) = 7 1-80, 
/9\ 2 
E(| 21242525 |) = 2 1-51, 
2 


E(| 2123242, |) = 


— 
bo 
or 
“~ 


(; 
| 
6(|21227425|) = ( 
( 
( 


E(| 21292 4% |) = 


a 
bo 
on 
i) 
—) 


0 
_ 
or 
bo 
— 


Nino Niv 


Fi 


(15) 


5) 
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3-2. Moments of d, 


n—2 
Again writing m = n—2 and putting Y = > |z,;|, we can expand &(Y), &( Y*), etc., into 
i=1 
expressions which are obtained from the right-hand side of (6) by replacing 2?* by | 2 | 
(¢ = 1, 2,3,4,5,6,7; k = 1,2,3,4). Substituting in the resulting expressions the values of 
the various terms from (5) and (15), after considerable simplification, we finally get the 
first four moments of d, as follows: 


2 
i = Ja” = 1-9544100, (16) 


9 
fy = (1:062321m-! — 0-519368m-*) — o’2 
7 


= (4:057769m— — 1-983839m-) o, (17) 
2\i 
Ms = (1-988102m-1 — 1-666781m-2) (;) o’8, (18) 
» Ae 
fg = {3(1-062321)? m-* + 1-494065m-3 — 4-864367m-*} (-) ili (19) 


From (17), (18) and (19) we have 


(1-988102m — 1-666781)? 





Px = (1-062321m—o-a19368) | oe 
' 20 
4-8045m — 5:6736 
and f, = 3+—— oa ah ba | 


(1-062321m — 0-519368)?"} 


It may be noted that there is hardly any point in retaining more than three or four places 
of decimals in the coefficients in £, because of the limited accuracy in the values of 
E(| 22% 32%4|), etc., given at the end of §3-1. The expressions for ~, and £, hold for m>4 
(n > 6) and those for ~, and /, for m>6 (n> 8). 


3:3. The distribution of d, 


Except in the very exceptional case of n = 4, where the exact distribution can be obtained 
as shown below in §3-4, it seems difficult to obtain the exact distribution of d,. Table 2 


Table 2. Standard deviation and £,, f, values for d, 























n a(d,/c) Ay Bs 
5 1-0640 0-972 4-225 
7 0-8557 0-622 3-799 
10 0-6901 0-399 3°515 
15 0-5481 0-249 3°322 
20 0:4683 0-181 3°234 
25 0-4155 0-142 3°183 
30 0-3773 0-117 3°151 
40 0-3247 0-086 3-111 
50 0-2893 0-068 3-088 
L heels EOE aia | aT Bah 








The mean of d,/o is 1-954410 for all values of n. The last figure of £, for n> 10 is not reliable. 











8 Distribution theory of two estimates for standard deviation 


gives moment constants for d, for n = 5,7, 10, 15, 20, 25, 30, 40, 50. The (£,, 2.) points tend 
to the normal point far more quickly than those for 63 (see Table 1) and lie in the Pearson 
Type I region. 


3-4. Exact distribution of d, forn = 4 
In this case following the method given by Kamat (19535) for d for n = 3, we have 


1 
Pr {d,/o <d,} = Pr 55 (| Aa, | + | Az, j) <4p| 


= Pr{|z,|+ |22| <3 dbo}, (21) 


where z; = A*z;,/(./6c) (i = 1, 2); and o(z;) = 1, p(z,,z,.) = —%. Therefore 





Pr {d,/o <d,} = 2| [ 


Je J 
L+Y<S Vid, r+y< Vid, 


f(x,y; }) dady + | f(x,y; ~2)dedy| (x,y>O) (22) 


| ? 


1 1 
, + id een a xv | — ___ (42. 4 2 — Ooxry)| . 
where f(x,y; p) on J( =P (1 —p%) (a? +y pey)| 
The expression on the right-hand side of (22) can be transformed by linear transformations 
into p,(h, k) functions defined by 


0 
e 


ph, k) = [ | flew p)dxdy, 


which are given in Tables VIII and IX of Karl Pearson’s (1931) Tables for Statisticians 
and Biometricians, Part II. The probability integral (21) can therefore be expressed as 


Pr {d,/o0 <dy} = 1— 2p, (do, 0) — 2p4(do/./5, 0) + 2p_ (dp, 0) + 2p_ yg(do/./5, 0). (23) 


4, THE CHARACTERISTIC EQUATION OF THE MATRIX OF THE QUADRATIC FORM "Ss (A2x;)? 
The matrix A of the quadratic form ia 
n—2 n—2 
y (A*x,)?= DJ (a; — 24 ;41,4+%4.)?=7zAz’ (24) 
i=] i=1 
is given by 1 —2 l 
2 <7 
] ~4 6 -4 | 
FE OO A Sn 2 (25) 
1 —4 6 —4 ] 








and therefore the characteristic roots of A are given by 


| A-AI| =0. (26) 


d 


mn 


us 
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If {x,} is the characteristic vector corresponding to a root A of (26) the following equations 


are satisfied: 
(1 —A)z,- 22+ 25 = 0, 


— 2x, +(5—A)x,—4x3+2, = 0, 





L,p_9 — 4%,_1 + (6—-A) a, —4%,,54+%,4.=90 (r= 3,4,...,n—2),} (27) 
Ln_-g— 4%,_9+(5—A)zX,_,— 2x, = 0, 
Ln—g— 2%y,_4+(1—A)z, = 0. } 


We have therefore to find the solution of the difference equation 
Uy — 4%,_1 +(6—A)%,— 44,45 4+%4.=9 (r= 1,2,...,0), (28) 


which satisfies the boundary conditions 


Xy— 24, +2, = 0, X_1—2%)+2x, = 0, 
0 1t%, 1 ot Xy (29) 
and Uyy— Wy t%yi, = 9, y- 2Fnsit+Fnis = 0. 
Let x, = p” be a solution of (28), then p satisfies the equation 
p?+p—4(p+p)+6-A=0. (30) 


Substituting p+! = u we get for wu the quadratic u*—4u+4—A = 0 which has the roots 
u = 2+At. Since p+p— = u, ie. p?—pu+1 = 0, p = H{ut./(u?—4)}, which leads to the 
following two pairs of roots of (30): 


Py Po = HA+AHL AS (At i 


(31) 
Ps; Pa = H2—Att/(A— 4B}. 
Now (P22 = P34 = 1, and we may therefore take 
= ef, = e-F, 
Py P2 ; i] (32) 
pP3=e", py= e~,J 
where 2+A*=2coshd and 2—A?= 2cos8, (33) 
and write the general solution of (28) as 
x, = Acosr?+ Bsinr6 + C coshr¢d + Dsinhr¢, (34) 


with the boundary conditions given by (29). If we substitute (34) in these boundary con- 
ditions and eliminate A, B, C, D we obtain an equation in @ and ¢ which is the characteristic 
equation provided we substitute for 0 and ¢ in terms of A from (33). 

By altering the notation for the characteristic vector it is possible to simplify the 
characteristic equation and split it into two equations. Let us take, for instance, n = 2m + 1 
and write the characteristic vector 


7 A 
A oe eee ee eee ee SO 
Then the boundary conditions (29) becomes 


L_(on+-1) —2L_m + %m—1) = 0, X_(n+2) + 22_(m+1) + Tm *. } (35) 
\ 


em+1— 22m +%y-1 = 0, em+2— 20 m+1 +Xm = 0. 











10 Distribution theory of two estimates for standard deviation 
The substitution from (34) in (35) and the subsequent elimination of A, B, C, D gives 


cos m6, cosh m¢, sin m6, —sinh m¢, 
cosm+16, coshm+1¢, sinm+10, —sinhm+14¢, 
cos m8, cosh m¢, —sin m9, sinh m¢, 


cosm+10, coshm+1¢, -—sinm+160,  sinhm+1¢, 
which splits into cos m@ cosh m + 14¢—cosm+ 10 cosh m¢ = 0, (36) 
and sinm@ sinh m + 1¢—sinm+ 16 sinhm¢ = 0. (37) 


It seems to be difficult to obtain the roots A in the general case. But for any given value of 


m (i.e. of n = 2m+1) we can expand cosm@, cosm+14¢, etc.; substitute A from (33) and 
obtain the two equations in A whose roots are the characteristic roots of A. The case when 
n is even (= 2m) may be treated in a similar way by taking 


Sop’ ot 
Wis = {Z_(m+y)> U- (m—))> +++yU_y, Hy, v0, Sg Bagh 


which leads us to the same equations as (36) and (37) with m replaced by m — }. To illustrate 
the procedure let n = 5,7,9, then the equations in A are found to be: 


n= 5: A2—13A4+10 = 0, A-5 =0, (38) 
n= 7: A3—19A2 + 70A— 14 = 0, A2-11A4+ 14 = 0, (39) 
m=9: A*—25A3+167A2—246A+18=0, AI—17A2+63A—30 = 0. (40) 


It can be proved from (36) and (37) that each one of them gives one zero root for A. The 
equations (38) to (40) therefore give the remaining n — 2 = 2m—1 roots in each case. 
The equations for the characteristic roots obtained above provide a useful check on the ex- 


pressions for moments obtained by elaborate algebra in § 2-2 above. Let A; (¢=1, 2, ...,2 — 2) 
be the characteristic roots of A, then by an orthogonal transformation we can transform 
n—2 n-2 
> (A*x,)?=2Az' = ¥ A; 23, (41) 
i=1 i=1 


where z; are independent normal variates with o(z;) = o(x;). Taking o(x;)=1, the 
characteristic function of =(A?z,)* can be expressed as 


1 . 7s 
P(t) = (myiK 5 |--- exp [ — 4222 + tDA,2?] mdz, 


n—2 i 
= J] (1—2iA,t)-#. (42) 
r=1 
The semi-invariants or cumulants of &(A?z;)? can therefore be written down from ¢(t). 
The first four moments can be shown to be 
ls = LA, fg = 2ZA}, 
Ay h 2 (43) 
fy = 8XAR, py = 33+ 482A). 
For instance for n = 9, from (40) we have 
DA, = 42, DA? = 454, TA} = 5628, TCAs = 74102, 


which confirm the formulae (8) obtained in § 2-2 above. 





SO 


Le 
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We may point out the similarity of the problem of solving the difference equation (28) 
with boundary conditions (29) with the problem of transverse vibrations in a rod with both 
ends clamped (see, for example, R. Courant & D. Hilbert, 1930, p. 253). 


Finally, I wish to express my sincere thanks to Prof. E. 8. Pearson and Dr H. O. Hartley 
for their advice in the preparation of this paper. 
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THE STATISTICAL TREATMENT OF MEAN DEVIATION 


By J. H. CADWELL 
Ordnance Board 


1. InTRODUCTION 


In small samples it is easier to compute the mean deviation than to find the mathematically 
more tractable standard deviation. As a consequence, mean deviation is the generally used 
measure of dispersion in some types of work. 

In the case of normal variation, the standard deviation, when corrected for bias, is the 
most efficient estimator of population standard deviation. However, in small samples the 
relative efficiency of mean deviation is high; for example, it is 91 ° for samples of ten. 

A more serious drawback of mean deviation is the rather troublesome form of its dis- 
tribution for samples from a normal population. As a result of this, counterparts to methods 
available for standard deviations have not been developed. For instance, while two sets of 
variances can readily be compared because of the additive property of the y? distribution 
and the existence of the F-ratio tables, the same problem for mean deviations has not been 
solved. A normal approximation to the distribution of mean deviation can be used for large 
sample sizes. It can also be used for the average of a number of independent values for a 
fixed small sample size. However, this still leaves many situations where it is inapplicable. 

It is the object of this note to give approximate methods applicable to some frequently 
needed procedures. The errors involved are not much greater than those caused by rounding 
the sample values of mean deviation to three significant figures. They should be tolerable 
in most cases. 

In §§ 2 and 3 the basis of the method is explained, and three applications are discussed in 
subsequent sections. 


2. THE BASIC APPROXIMATION 


In a previous paper (Cadwell, 1953) the use of a power of y? as an approximation to the 
distribution of mean deviation is considered. This gives a result of the form 


m/o is approximately distributed like (y?/c)*. 


The variable power used, together with the degrees of freedom of y? and the multiplicative 
constant, allow the first three moments of the two distributions to be matched. 

The dependence of the power to be used on sample size restricts the method to sets of 
values of mean deviation with a common sample size. By choosing a suitable fixed power 
we can ensure agreement of the first two moments with a small discrepancy for the third 
moment. The ability to work with sets of varying sample size adds greatly to the flexibility 
of the approximation. 

We use m(%, n) to denote the average of k values of mean deviation, each for a sample of 
size n from a normal population of standard deviation o. We find that 
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has approximately the x* distribution on v degrees of freedom. The vaiues of c and v are 
given by 
0-159 0-617k- 


+..., where v= 


v = Vy +0-196— “ mn 








loge = log 2+ 1-8 {log T'(0-5 + 4v) — log I'(4v) — log &(m,,/o)}. 


Here &(m,,/o) and V, are respectively the mean value and coefficient of variation of the mean ° 
deviation in a sample of n. The value of v can be rounded to 0-1 unit with little loss of 
accuracy. It is advisable to retain three or four figures in c. 

Table 1 gives values of c and v for n from 4 to 10 and k from 1 to 10. Table 2 gives values 
for n from 10 to 50 by steps of 5, and for k from 1 to 5. The value of v is found by linear 
interpolation for n, and then rounded to 0-1 (or to an integer for values of v greater than 20). 
The constant c is determined by linear interpolation for this value of y, i.e. c is regarded as 
a function of v and not of n. 


3. ACCURACY OF THE APPROXIMATION 


The power of x? required for a three-moment fit runs from 0-5 when n = 2 to 0-594 when 
n = ©. Empirical work shows that a single value of 0-5 (reciprocal 1-8) gives a reasonable 
compromise for values of n greater than 3 (see Table 1 of my previous paper (Cadwell, 
1953)). By using Godwin & Hartley’s tables (1945) we can examine the accuracy of the 
approximation (1) up tom = 10. Beyond this, the more refined method of my previous paper 
gives a measure of error due to the present approximation. Using (1) and tables of the x? 
integral we evaluate 
Pr {mean deviation <m,o} 


approximately and compare the result with the true value. 

The error is found to decrease for sample sizes from 4 up to about 7. It then increases again 
up to sample size 15, thereafter tending steadily to zero. Errors are shown below for 
n = 5,10: 























n=65 Mo 0-17 0-32 0-49 0-70 0-93 | 1-19 1-51 
True value 0-0050 | 0-0527 0-2100 0-5122 | 0-7978 | 0-9510 0-9951 
Error —0-0002 | —0-0003 | +0-0006 | +0-0002 | —0-0004 | 0-0000 | +0-0001 
n=10| mp, 0-33 0-46 0-59 0-75 0-91 1-09 1-30 
True value 0-0046 | 0:0474 | 0-1933 0-5074 0-7950 | 0-9520 0-9951 
Error + 0-0003 | +0-0010 | +0-0006 | —0-0013 | —0-0011 | 0-0000 | +0-0001 
| | 
| 








When k exceeds 1 no exact values are available; the following procedure was therefore 
adopted. 

Using series expansions for f, and #, due to Geary, and given in an Editorial 
Note to Godwin & Hartley’s paper (1945), we find the f, and /, differences of the 
distributions of : 

(x2/c)°> and m(k,n). 
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Results are shown below for m = 5, 10 and k = 1 to 4: 


























| 
n k Bm) B.(™m) | f, difference f, difference 
5 1 | 0-230 | 197 | —0-002 — 0-020 
s | O11 4. 3008...| — 0-007 — 0-029 
3 | 0077 3-066 | — 0-006 — 0-026 
4 0-057 | 3-049 | ~ 0-005 ~ 0-019 
| | 
10 1 0-106 | 3-093 — 0-009 — 0-034 
2 | 0053 | 3046 | -0006 | -0-019 
3 | 0085 | 3031 —0-004 | —0-013 
éouf 0-026 3-023 — 0-003 — 0-009 
| | 
| \ | 





Apart from an increase from k = 1 to k = 2 when n = 5, differences in the f’s decrease as 
k increases. It will be seen that » = 5, k = 2 and n = 10, k = 1 have roughly the same £ 
values and give nearly the same f differences. Thus errors in the first case should be similar 
to those in the second and these have been examined above. 


4, TESTS OF HOMOGENEITY 


Given a set of k mean deviations supposedly for samples from normal populations of 
a fixed standard deviation, we may suspect that the variability present is greater than can 
be attributed to sampling fluctuations. In order to test this point we use (1) to transform the 
set of mean deviations to a fixed multiple of approximate y? values on 1, 2, ..., ¥, degrees 
of freedom respectively. We can now apply the Neyman-Pearson L test with Bartlett’s 
modification to test the homogeneity of these x? values. 

When all samples are of the same size n, a rapid but somewhat less powerful test is avail- 
able. We evaluate the ratio of maximum and minimum values in the set. If this ratio 
exceeds the value given in Table 5 of my previous paper (Cadwell, 1953, p. 346), we assume 
that some sort of heterogeneity is present. This procedure will result in an incorrect decision 
in approximately 5 % of cases when the values are homogeneous. Table 6 of the same paper 
gives the corresponding 1 % points for this ratio. The exact probability will seldom differ 
from the nominal value by as much as 0-2 % when using these tables. 


5. EsTmMATION OF STANDARD DEVIATION 


Suppose we have a set of values m,, mz, ...,m, from populations of fixed dispersion. Using 
the additive property of y?, we shall have: 


Xe,(m,/o)"* has approximately the x? distribution on Xv; d.f., 


where c; and v; are obtained from the first column of Table 1 to correspond to the sample 
size of the ith mean deviation. This will furnish the estimate 

~ _ {Zemy* 06 
re ot cae 
Confidence limits for 7 can readily be found from tables of the xy? integral. Thus the true 
value will be less than eset 


2 
X0-05 
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on 95 % of occasions. If we require limits that include the true value on 95 °% of occasions, 


we shall take : “ie 
[Femi is (Feet 


\ Xi-975 


When all k samples have the same size n, we can use the alternative estimate 


2 
X0-025 


> x WEN) tim joy= [2e=! 

o= Em, |0)’ where &(m,/o) = J 
Values of &(m,,/0) are given in Tables 1 and 2. Tables 3 (a)-(d) enable either one- or two- 
sided 95 % limits to be found. Thus on 95 °% of occasions the true value of o will lie between 


M(k,n)/Moo75(k,n) and MmM(k,n)/Mooo5(k, 2). 


Errors in Table 3 will seldom exceed two units in the third decimal. 

Should it be necessary to deal with values of k beyond 10 a normal approximation to the 
distribution of m(k,n) can be used; errors in the 2-5 % levels will not exceed 0-01 with 7 
as small as 5. 

The expected value is given above and the variance is given by 

mk,n) 1 M(1,n)  2(n—1) 


var — = —var = : ly {n(n — 2¥$ —n + sin-1 ——_}. 
o k o kn?xn \? *N% ( y n—1) 


Values of var m(1,n)/o are given in Tables 1 and 2. 


6. COMPARISON OF TWO SETS OF MEAN DEVIATIONS 
It is often necessary to compare two sets of samples to see if the dispersion can be regarded 
as the same in the two cases. 


On the hypothesis of a common variance in the two sets, we shall have 


xXc,m}* Xv; 
Ee}(m')*8 Ly, 
is approximately an F’-ratio on Xv, and Xv; degrees of freedom. Working at the 5 % level 
we shall reject the hypothesis of a common variance if either this ratio or its reciprocal 
exceeds the upper 2:5 % level of F on the appropriate numbers of degrees of freedom. 
When the sample sizes are constant within the two sets, we have a simpler test based on 
ev’ { m(k,n) \?® 
c'v \m(k’, n’)} 
The values of c and v are obtained from Table 1 entered with n and k and similarly for c’ 
and vy’. This ratio and its reciprocal will be compared with the upper 2-5 % point of F on 
v and v’ degrees of freedom. 
It will be seen that fractional degrees of freedom are involved. However, in the majority 
of cases significance or non-significance can be settled without elaborate interpolation. 
Where it can be used, the latter approximation is to be preferred. It involves two cases of 
(1) for values of k usually greater than unity, while the other depends on a number of 
applications of (1) each with k = 1. As the errors in (1) fall rapidly with increase of k, the 
second approximation should be more accurate as well as being simpler to apply. The degrees 
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of freedom involved in the second method are a little less than in the first. However, the 
difference in power of the two tests should be negligible. The same argument will apply to 
the similar choice of methods that may arise in § 5. 


I should like to thank Mr D. F. Mills for his help in the preparation of the tables. 
Acknowledgement is made to the Chief Scientist, Ministry of Supply, for permission to 
publish this paper. 
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Table 1. Transformation constants, c and v. The upper figure 
(in bold type) is that of v and the lower is c 

















16-7 | 22-3 | 27-8 | 33:3 | 388 


| 9336 | 19-05 | 28-77 | 38-67 | 48-40 | 58-13 | 67-86 | 77-59 | 87-50) 97-23 | 0-7284 | 0-05934 


| | j mi. mm 
re ee OE 6 7 s | 9 10 é(=) a. 
| } | | | C Cc 
| | | | 
enero) pre) prey 
4| 35 | 69 | 102 | 136 | 169 | 20-2 | 23-6 | 26-9 | 30-3 | 336 | 
oe ae | 19-00 | 25-61 | 32-02 38-44 | 45-06 | 51-47| 58-08| 64-50 | 0-6910 | 0-08818 
| | | 
5| 46 | 90 | 135 |17-9 | 22:3 | 268 | 31-2 | 35-6 | 40-1 | 44-5 
7-677 | 15-73 | 23-98 | 32-05 | 40-12 | 48-38 | 56-46| 64-53| 72-79| 80-86 | 0-7136 | 0-07094 
| | 
| 
| 





. | 20-0 | 26-6 | 33-2 | 398 46-4 | 53:0 | 59-6 | 66:2 
| 10-99 | 22-36 | 33-74 | 45-12 | 56-51 | 67-89 | 79-28 | 90-66 | 102-0 | 113-4 | 0-7387 | 0-05101 


~] 
oa 
i?) 
— 
& 
be 





| 
1386 | 463 | 549 


| 15-6 | 23-3 | 30-9 | | 
12-65 | 25-67 | 38-71 | 51-57 | 64-61 | 77-65 | 1168 | 129-8 | 0-7464 | 0-04473 
528 | 616 | 70-4 | 791 | 87-9 


| | } | 
87-40 | 102-1 116-8 | 131-3 | 146-0 | 0-7522 | 0-03982 





90-68 








| 9| 90 | 17-7 | 265 | 35:3 | 44-1 
14:30 | 28-82 43-50 | 58-19 | 72-88 | 








110 |100 | 199 | 298 | 39-6 | 49-5 | 59-3 | 692 | 79:0 | 889 | 98-7 


} 
15-79 | 32-12 | 48-47 | Boe 64 | 80-98 | 97-16 | 113-5 | 129-7 | 146-0 | 162-2 | 0-7569 | 0-03589 
| | 








Table 2. T'ransformation constants, c and v 


\ 1 | 2 | 3 4 5 | 
| | | ™m..\ | 


6(=) var Me b™) 
Cc} | Cc 























| 

n v c v ee a ee oe. v | c | 

Pub | be | 

| | | | | 
10 | 10-0 | 15-79 19-9 | 32-12 | 29:8 | 48-47 | 39-6 | 64-64 | 49°5| 80-98 | 0-7569 | 0-03589 
| 15 | 15:5 | 24:07 | 30-7 | 48-34) 46-0| 72-78 61-2 97:07 | 76-5 | 121-5 | 0-7708| 0-02403 
20 | 20-9 32-17| 41-5) 64:56| 62-2 97-11 | 82-9 | 129-7 | 103-5| 162-0 | 0-7777| 0-01806 
25 | 26-3 | 40-28 | 52-3 80-78 | 78-4 | 121-4 104-5 | 162-1 | 130-5 | 202-6 0-7818 | 0-01446 
30 | 31-7 | 48-39 | 63-2} 97-15] 94-6 | 145-8 | 126-1 | 194-5 157-6 | 243-3 | 0-7845 | 0-01206 
35 | 37-1 | 56-50 | 74-0 | 113-4 | 110-8 | 170-1 | 147-7 | 226-9 184-6 283-8 | 0-7864 | 0-01035 
40 42-5 | 64-60 | 84:8 | | 129-6 | 127-1 | 194-5 | 169-4 | 259-5 | 211-6 | 324-3 | 0-7878 | 0-00906 
45 | 47-9! 72-71| 95-6/| 145-8 | 143-3 | 218-9 | 191-0 | 292-0 | 238-6 | 364-9 | 0-7890| 0-00805 
50 | 53-3 | 80-82 | 106-4 | 162-0 | 159-5 | 243-2 | 212-6 | 324-4 | 265-7 | 405-6 he 7899 | 0-00725 

| | | | | | | | 














The symbol m(k,n) denotes the average of k independent mean deviations for samples of size n from 
normal populations of standard deviation o. 

In Tables 1 and 2 the last two columns give the expected value of m(k, n)/o and the variance of 
m(1,n)/o respectively. In addition we have the result: 


c{m(k,n)/o}** has approximately the y? distribution on v d.f. 
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Table 3. Percentage points of m(k,n), the averages of k independent mean deviations, each for 
a sample of n from a normal population of unit standard deviation 
(a) Lower 2-5 % points 





or 


NS 1 2 3 4 6 7 8 9 10 
a 





4 0-199 0-323 0-382 0-420 0-446 0-466 0-482 0-495 0-505 0-51 
5 0-260 0-376 0-433 0-468 0-492 0-510 0-525 0-536 0-546 0-55: 
6 0-306 0-416 0-469 0-502 0-525 0-642 0-555 0-566 0-575 0-58 
7 0-342 0-447 0-497 0-528 0-549 0-565 0°577 0-587 0-596 0-60 











8 0-372 0-472 0-519 0-548 0-568 0-583 0-595 0-604 | 0-612 0-619 
9 0-396 0-492 0-537 0-565 0-584 0-598 0-609 0-618 | 0-625 0-632 
10 0-417 0-509 0-552 0-578 0-597 : 0-610 0-621 0-629 0-636 0-642 





(6) Lower 5 % points 


rii-? owe mi a eo ME kil “tc Lon me + few: ft 











4 0-254 0-371 0-425 0-459 0-482 0-499 0-513 0-524 0-533 0-541 
5 0-315 0-422 0-473 0-503 0-525 0-541 0-553 | 0-563 0-572 0-579 
6 0-360 0-460 0 0-535 0-555 0-570 0-581 | 0-590 0-598 0-604 
7 0-394 0-489 0-532 0-559 0-576 0-591 0-601 0-610 0-618 0-624 

















8 0-422 0-512 0-553 0-578 0-595 0-608 0-618 0-626 0-633 0-638 
9 0-445 0-529 0-569 0-593 0-609 0-621 0-631 0-638 0-645 0-650 
10 0-464 0-545 0-583 0-605 | 0-621 0-633 0-642 0-649 0-655 0-660 
| 
(c) Upper 5 % points 
Ol. ee Sa a ea Ce eee bees eee we Se eg as 
\ ] 2 3 4 5 6 7 8 9 10 
nn 
- 
4 1-224 1-057 0-988 0-945 0-918 0-898 0-882 0-869 0-858 0-850 
5 1-187 1-041 0-978 0-941 0-916 0-898 0-884 0-873 0-863 0-856 
6 1-158 1-026 0-969 0-935 0-913 0-897 0-884 0-874 0-865 0-858 
7 1-135 1-013 0-961 0-930 0-910 0-894 0-882 0-873 0-865 0-859 


8 1-116 1-002 | 0-954 0-925 0-906 0-892 0-881 0-872 0-864 0-858 
9 1-100 0-994 0-948 0-921 0-903 0-889 0-879 0-870 0-864 0-858 
10 1-086 0-985 0-942 0-917 0-899 0-887 0-877 0-869 0-863 0-857 

















(d) Upper 2-5 % points 








k 
N 1 2 3 4 5 6 7 8 9 10 
me ON 
4 1-344 1-137 1-051 0-999 0-966 0-941 0-921 0-906 0-893 0-882 
5 1-292 1-111 1-033 0-989 0-958 0-936 0-919 0-905 0-894 0-884 
6 1-253 1-089 1-020 0-978 0-951 0-931 0-915 0-903 0-892 0-884 
7 1-222 1-071 1-007 0-970 0-944 0-926 0-911 0-900 0-891 0-883 


8 1-196 1-056 0-996 0-962 0-938 0-921 0-908 0-897 0-888 0-881 
9 1-175 1-044 0-988 0-955 0-933 0-917 0-904 0-894 0-886 0-879 
0 1-156 1-033 0-980 0-949 0-928 0-913 0-901 0-891 0-884 0-877 









































The values in Tables 3(a)-(d) are exact for k = i. For other values of k, the error will not exceed 
0-003 and will usually be much smaller. 
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TESTS OF LINEAR HYPOTHESES IN UNIVARIATE AND 
MULTIVARIATE ANALYSIS WHEN THE RATIOS OF THE 
POPULATION VARIANCES ARE UNKNOWN 


By G. S. JAMES 
University of Leeds 


1. INTRODUCTION 


In a previous paper (James, 1951) I have attempted to generalize B. L. Welch’s method of 
comparing two means when the ratio of the population variances is unknown (Welch, 
1947a,6; Aspin, 1948, 1949) to the case where one requires a test for the equality of several 
means. Welch (1951) has derived a similar test which makes use of tables of the variance 
ratio instead of those of y?, and has shown that the two tests are equivalent to the first 
order of inverse powers of the sample sizes, which is the order to which he worked. These 
results can easily be applied to certain other problems, such as comparison of the slopes 
of regression lines, but these extensions are, from the mathematical point of view, trivial, 
for they merely depend on our ability to calculate from the data quantities which are 
distributed in forms to which the original arguments can be applied. 

The essential difference between Welch’s original problem (considered as a test of sig- 
nificance) and my own was that the linear hypothesis to be tested imposed one constraint 
in his case and several in mine. However, the problem of several means is a very special 
one, so in § 2 of the present paper we consider a more general set-up. §3 consists of simple 
illustrations of this general work. In § 4 applications are made to a simple factorial experi- 
mental lay-out, in which no assumption about constancy of the error variances can be made, 
but in which they can be individually estimated from replicate determinations. The work 
of § 2 is generalized to the multivariate case in § 5, and in §§ 6 and 7 the results are applied 
to the multivariate problems of two and several samples, respectively. The two-sample 
problem is the multivariate analogue of the Behrens-Fisher problem, as treated by Welch. 
§8 consists of numerical illustrations of the multivariate tests of §§ 6 and 7. 

The reader’s attention is drawn to the remarks at the end of § 4; it is hoped to investigate 
the problem further along the lines suggested there, and to publish any reasonably simple 
techniques found for dealing with the sort of problem which has been encountered in prac- 
tice. Meanwhile, it will be noted that §4 is not essential to the understanding of the rest 
of the paper. 


2. A UNIVARIATE LINEAR HYPOTHESIS 


Let 2,...,%, be m random variables whose mean values are known linear functions of k 
unknown parameters @,,...,6,, and which are distributed independently and normally 
about these mean values with unknown variances «,,...,%,. In matrix notation we may 


write &(x) = BO, (2-1) 
where x and @ are the column vectors {z,,...,2,} and {6,,...,6,}, and B = (6,,) is annxk 
matrix of known constants.t (We assume that the rank of B is k.) Let a; (¢ = 1,...,m) be 


+ Any term independent of the 6; which would appear on the right-hand side of (2-1) can be 
incorporated into x by a transformation. 


2-2 
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unbiased estimates of the corresponding «,;, based on v; degrees of freedom, and distributed 
independently of the x, and of each other in the usual x?-type distribution. 

We now suppose that the null hypothesis to be tested is that 6, = ... = 0, = 0 (r being 
some number between | and k), the alternative hypothesis being that the whole set of para- 
meters 0,,..., 9, is entirely unrestricted. Often an apparently more general linear hypothesis 
occurs, in which the parameters have constraints imposed upon them even under the alter- 
native hypothesis, and in which the null hypothesis states that the parameters satisfy a 
further set of r linearly independent linear relations; but this type of hypothesis may always 
be reduced to the above canonical form by means of suitable linear transformations of the 
parameters. (See, for example, Wilks (1946, pp. 171-3); examples of the reduction occur 
in later sections of this paper.) If we partition @ according to its first r and last (k—r) 
elements, thus: 6 = {6,,6,}, then the above null hypothesis may be written as 6, = 0. 

Now suppose for a moment that the «; were known. Then, under the alternative hypo- 
thesis, the value of 8 which maximized the likelihood of the observed x would be 


6 = C-d, (2-2) 


where C=B'’a"'B, d=B’a'x, a = diag (q,...,a,). 


~~ 
te 
& 


Also the variance matrix of the estimator 6 would be 
var6 = C-1, (2-4) 


Moreover, it is known that, under these circumstances, the test for the null hypothesis 


derived by the likelihood-ratio method would depend on the fact that, when this hypothesis 
is true, 


6; V-6, is distributed as y? on r d.f., (2-5) 
where V is the variance matrix of 6, (namely, the first r rows and columns of C-"). That is 
to say, writing t in place of 6, for brevity, and using the notation t’V-t(«) as a reminder 
that both t and V are, in general, functions of the a,, 

Pr [t’V-"t(a) < 2] = G,(€), (2-6) 
where 2£ denotes the tabled value of y? for a particular probability or significance level, 
p = }r and £ 
6, (6) = (p)} | venta (2-7) 


(The symbols é and p are used in place of y? and r, and the formula (2-6) written in terms of 
£ rather than in terms of the corresponding probability or significance level, merely as a 
matter of later mathematical convenience. The final results will be translated back into 
familiar form.) 

As the a; are not in fact known, we may, if their estimates a; are based on large numbers 
of degrees of freedom, use as an approximation the result 


Pr [t’V—'t(a) < 2£] = G,(E), (2-8) 


where t(a) and V(a) are the same functions of the a; as t(a) and V(a) are of the «;. This 
suggests that in the general case we try to find a function h(a) of the a; (and of £) such that 


Pr [t’V-1t(a) < 2h(a)] = G,(é). (2-9) 


In large samples 2h(a) will approach 2£ = y*, and we shall hope, as in previous work, to be 
able to write h(a) as a series of terms of decreasing order of magnitude, as in (2-13) below. 


oO 


i.) 
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Carrying out two successive symbolic Taylor expansions, (2-9) leads in the usual manner 
(see James (1951) or Welch (1947a)) first to 








@ Pr[t’V—t(a) < 2h(a)] = G,(E), (2-10) 
and then to O exp [h,(a) D] Pr[t’V—-t(a) < 2¢] = G,(), (2-11) 
where © = I [e-*i%(1 — 2a, 8,/v,)-**] 
4 1 202\2 
= 14594 mea ee = +5 ( =) |+00-) (2-12) 


Me = §+h,(a) 
= £+h,(a)+h,(a) + (2-13) 


(h,(a) being of order v-*), and where 0; denotes 0/0a; and D denotes 0/0. Substituting these 
expressions for © and h,(a) into (2-11) and equating terms of successive orders, we obtain, 
exactly as before, 


Kc x) D+ 5%" i) Pe (e'V- At(x) < 2€] = 0, (2-14) 


Ko a) D+ $a) D+ ES qoac ) D+ 2h) d, D+ hy (ax) 03D) 


aa 1 


oy ty 


a2? 
(24 *5 <i)" Prev —It(ax) < 2] = 0, (2-15) 
i 
as equations giving h,(a), h(a), and hence h,(a), h,(a). Here h(a) denotes 0;h,(«), A¥?(a) 
denotes 0;0;h,(«), and so on. 
We employ the method used previously to evaluate derivatives like 0? Pr[...]. Let 


J = Pr[t’V—t(a +e) < 2€]. (2-16) 

Then, by Taylor’s theorem, 
J = [1+ Xe,0;+ $2e,€,0,0; + ...] Pr [t’V—1t(a) < 26]. (2-17) 

We also have, by definition, 
J = (27)-* | var t(a2+e) rf exp[—4t'(vart(a+e))-'t]dr, (2-18) 

TV Mate)T< 2 

where i dr denotes integration over the region indicated in the space of 7;, ...,7, (i.e. the 
space of realizable values of the random variables ¢,,...,¢,). It should be noted that 


var t(z+e) is not the same thing as V(a+e), which is var t(«) with «;+¢,; substituted for 
the a,; for the original random variables z;, of which t(«) is a function, are still supposed to 
have variances a;, not a;+¢€;. Now in (2-18) we make a real, non-singular linear trans- 
formation 


<=Tu (2-19) 

to new variables u;, which is such that 
4T’V—(a+e) T =I, (2-20) 
}T’(var t(a +6)? T = 1-7, (2-21) 


where I is the unit matrix and y is a diagonal matrix: 


yH = diag (9, .--,I,)- (2-22) 
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T will be a function of the «; and e;. Under this transformation (2-18) becomes 


J = 9 |1—n){ exp[—u’(I—»)u] dw. (2-23) 
au<é 

We now expand the factor exp [u’ynu] = exp [27,u?] appearing in the integrand in powers of 
the 9; and integrate term by term. The individual integrals are of Dirichlet’s type, and we 
obtain, exactly as in the. paper previously cited (equation (26)), 


I—nk|\* 
J= ( ) G(é), 2-24 
Tonl) S08) (2-24) 
where we again use the operational notations HG@,(£) = G,,,(£), A= H—1. Now, using 
equations (2-20) and (2-21), we have 

|I-—nz#| _ |I-n-nA| 

[I-n| |I-y| 


_ | var-'t(a+e)—[V-"(a+e)—var-—t(a+e)]A| 
| var-!t(~+e) | 











= |I-[V-(a+e).vart(«+e)—IJA|. (2-25) 
Thus, from (2:24), J = |I-XA|+@(é), (2-26) 
where X = V-'(a+e). vart(a+e)—I. (2-27) 


Such expressions as | I— XA |-? are not easily expanded directly in a form suitable to our 
needs, but we do have the special result (an analogue of the ordinary logarithmic series) 


—log|I-Y| = tr ¥+4}trY?+4}trY?+.... (2-28) 


If Y is a numerical matrix this series is convergent if all the eigenvalues of Y are numerically 
less than unity; a proof of this result will be presented in §5. Using it in (2-26) we have 


J = exp(—}log | 1—XA]) @,(£) 
= exp($trX.A+}trX?.A?+...)G@,(é) 
= (1+$trX.A+ {fh tr X*+ }(tr X)*} A? + ...]@,(2). (2-29) 
We shall first of all treat the simple case in which t(«) is actually independent of the a; 
(as in the k means problem previously considered), deferring consideration of the more 
complicated general case until later. Since t(«) is independent of the «,, it follows firstly that 


var t(a+¢) = vart(a~) = V(a), and secondly that V(«) is a homogeneous linear function 
of the a; (since t is a linear function of the z,). Thus (2-27) becomes 


X = V-'(a+e) V(a)—I 
= [V(a) +e; V,(~)]}-* V(a)—I 
= (I+ Xe, V-'V,]“-I 
= — Le, VV, + Le,e;, VV, VV, —..., (2-30) 


+ In the second line of this equation a summation sign (referring to the b;) was accidentally omitted. 


me be Th wh 
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where we abbreviate V(x) to V and use the notation 


Vi, = N da, Viy = 0°V/da,0a;,.... (2-31) 

Also X? = Le,e, VV, V-V;-.... (2-32) 
Using the abbreviated notation 

(i) =trV-V,, (i,j) =trV-V,,, (| j) = tr VV, VV, ..., (2-33) 

we have tr X = —Le,(i) + Le,e,(¢ | j)—..., (2-34) 

tr X? = Le,e,(i| j)—..., (2-35) 

(tr X)? = Le; €,,2) (7) —..., (2-36) 


whence J = [1+ Ze,{—}(t) A} + Ze, e,{(i | 7) ($A + $42) + £(c) (7) A} +...] @,(€). (2-37) 
Comparing (2-17) and (2-37) we obtain 


0; Pr [t’'V—'t(a) < 2] = — 4(7) AG a(S) = = £0 — (2-38) 
0,0; Pr [t’V—'t(ax) < 24] = [(¢| 7) (A+ $A?) + }(2) j)A*1G,(8 
= [- 46] 9)( ‘inom acai ~ i (2-39) 
where 9,6) = a = [Tp] ett. (2-40) 


By a straightforward but tedious extension of the algebra we can obtain 
0;0;0;, Pr[...] = [(é| j | &) (23+ H? + B) + 3((¢| 9) (kK) + (| 2) (9) +9 | 4) @) (B-2Z) 
+ &(#) (9) (A) CCU Ga ea (2-41) 
8,00, Pr(...]={— (i j] BID + (| Blt] 9) +t] j|) t+ 08+ B*+ B) 
mean )+(¢ | 5|0) (k) +] E10) G) + GB] ()) (4B) 
—4((i| j) (|) +E] &) (9 |0) + |) (| &)) (B+ BS— B*—E) 
—4((¢| 9) (4) (I) + (| k) (9) () +... to 6 terms) (H*— #* — H? + E) 
— Js(i) (J) (&) (1) (B*— 3E8 + 3E*— E)] 9, (£). (2-42) 
There is, however, one rather tiresome point that arises in this sort of work. If, for example, 
eal La 5, 6,6; €y = Lb; 5,€;€; Ex, (2-43) 


we are only entitled to infer that a, = 6, if both these sets of quantities are completely 
symmetrical in their suffixes. Now a trace such as (i|7|) is unaltered only by cyclic 
permutations and reversals of the order of its letters; thus (i | j | &) is completely symmetrical 
in its letters, but (i | j| &| 7) and (7| 7) (&) are not, and must be replaced by 


HEF RID+GL RZ A+ElLF|H) and 3] 9)(K)+@]*) (9) +014) @) 


respectively. This has been done in (2-41) and (2-42). 
Substituting (2-39) into (2-14) we obtain 


we) A252 (kn) + (apap) ae 
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Differentiating (2-44) twice with respect to «;, and substituting the results, together with 
(2°38), (2°39), (2-41) and (2-42) into (2-15), we finally obtain, as the approximation of 
order p~?, 


2h(a) = xX 2445 13 Xa Xe) (0 | t) + (Xa — Xz) (2)? ] 





1 (,_1=2) [yal sal? 
" fc ( os = ) 2 =, [2(X4+ Xa) (¢ | *) + (Xa Xa) a] 
a ~ a 
—2z al? 2(Xa + X2) (| t) + (Xa— Xe) (2)7] 


3 
+ 405 [2(X4+ Xa) (i | ¢] 4) + (Xa— Xe) (| 4) (@)] 


A205 Bay 5) 
— BTR, (2(Xat Xa) (26 | t] F/I] ILE] I) + Xa Xe) (26 | #19) G) + 1 IPN 
wd 


+(X,—-1)2 = [2(X4 + Xe) ( | t) (¢) + (X4— Xz) (@)7] 


aja; ied 9 
~ (a= WR (Xa + Xa) (| E| 5) (I) + (Xa Xe) 1) OO 
> aza? 9 sas 9 “9 *\9 
— BE [2(Xa— 1) (| 4) + (Xa — 2X2 + 1) PY) L2(Xa + Xe) (G1 I) + (Xa Xe) (97 
aa . . 
—4%2 “3 (8(X6 + Xe + Xo) (i | i | 4) + 6(X—— Xe) (7 | 4) (¢) + (Xe — 2X4 + Xe) (7)7] 
1 24; " 
+7g2 _— [16(x3+ Xe +Xat Xe) (2i 14] J] I) +(e] 9] 7] 9) 
"4 
+32(Xs— Xe) (¢|¢| 9) gy gl Xa— Xa) ((¢| 4) GF | J) + 2C¢ | J)*) 
+ 4(Xs— Xe— Xe + Xe) ((e| 4) (G)? + 2 | 9) (9) 
+ (Xs— 3X6 + 3X4 — Xe) (# te Ao *). (2-45) 
Here y,, denotes y”*/r(r + 2) ... {r+ 2s— 2), while of course (i) now denotes tr V-1(a) V,(a), 


and so on. The first line of this pebtilt contains the terms of order 1 and v—!, while the curled 
bracket contains the term of order v-?. 
We now take up the general case. From (2-2), 


6(x +e) = C(x +¢)d(a+e) = C-!(a+e) B’(a+e)-' x. (2-46) 
Hence var 6(a +e) = C(a+e) B’(a+€)1a(a+e€)-! BC-"(x +€) 
= C-l(a+e) C((a+e)?/a) C(a+e), (2-47) 


and var t(a +e) is the matrix consisting or the leading r rows and columns of this, or say 
var t(a+e) = [C-(a +) C((a +e)?/x) C-"(a+6)],). (2-48) 


Thus, from (2-27), 


X = V-'(a +e) [T(a +e) P-'((x+e)?/a) T(a+e)],,—I, (2:49) 





1> 


d 
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where we have introduced a new symbol r = C-1. Now 


T-"((a+)?/a) = "(a + 2e + €?/a) 


u~)" %,) 


’ ; Ze 
= P+3(2¢,+5) I, + 22e;¢,T; +... 


\ 


l * —t 
= [1+ 22e,P PL; + Le, e; (2 PrP, + 2r-T; : + | iia 
L i j 


= |I—23¢,F—1T, + Leq6,( = s rr, — or 4 srr, TT) de a r-1, 
: (2-50) 





where I denotes I'(«), T'; = oF /da;, and so forth. Thus 
T(a+e)P((a+e)?/a)P(a+e) = P[1+ Le, TL, + $de,¢,T LP, 5 +...] 


x L — 2X¢,T“T, + e,6;( - Se rf, —2r-T, ,+ 4r-T, rr;) + 4d 
x [I+ Le, TP, + $2e,6,TD, 5+...) 


=P+3e¢,(-[T,-T,, 41.27) +... (2:51) 


Hence, since I’,, (the leading r rows and columns of I) is V, (2-49) becomes 


X = [I+ Le, VV, + Xe;€; VV; 5+... V4 
x [V+ Ee,e)(—8V.—Viy+ PET) + a +] 


= — Le, VV; + Be,6;( 9s VV, -3V-V; 5+ VV; VV, + V-L,P=T))) +... 
a S ; 
(252) 
The work of evaluating J and the derivatives now proceeds as before, and the final result is 


- ie ae a? - ‘2 
2h(a) = x*— 2x,E—* (i) + EZ" [— Bxal i, t) + (Xa + Xe) | 4) 
+ 4(X4— Xz) (4)? + 2x_ tr VAULT OL;) 1) + Oly). (2°53) 
Of course I now denotes (a), and so forth. 
In at least one application it is preferable to write the result (2-53) in terms of the matrices 


AzV-) C=T"', (2-54) 
and their derivatives. It is easy to prove that 
(Gi) = -trA“A,, (2:55) 
(i,7) = —trA“A, ,+ 2tr(A“A,)’, (2-56) 
(¢|7¢) = tr (A“A,)?, (2°57) 
rT, = C1C,cC;Cc-1. (2-58) 
Hence 2h(a) = x?+ 2yeE—ttr AA, 


+= (3x2 tr AA, ; + (X¢— 5X2) te (AMA,)? + 3 (Xa — Xe) (tt AA)? 


+ 2y, tr A(CAC, CAC, C-2),,] + Ov). (2:59) 
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One of the formulae (2-53) or (2-59) replaces the first line of (2-45) in the general case. 
I have not worked out the term of order v~* in this case; it would obviously be extremely 
complicated, and is unlikely to be of practical use. 

We may summarize the work as follows. To test the hypothesis 0, = ... = 0, = 0, we first 
obtain the maximum-likelihood solution for 6 under the alternative hypothesis, namely, 
6 = C-'d, and at the same time its variance matrix C-!, as though the estimates a; were the 
true variances a; Denote the vector formed by the first r elements of this solution by t, 
and the corresponding part of the variance matrix by V. If the a; are based on sufficiently 
large numbers v;, of degree of freedom, then t’V-'t may be compared directly with the 
apprcpriate significance point of x? on r degrees of freedom. If not, a more accurate test 
will be to compare it with 2h(a) given by (2-45), in the simple case and by (2-53) or (2°59) 
in the general case. 

A slight generalization may be made in our initial assumptions, which may be useful in 
some applications. Instead of taking the variance of each 7; as a;, we may assume that 


m 
vara; = = Bij %;, (2-60) 
j= 


where B = (f;,), like B of (2-1), is a matrix of known constants. The a; are estimated by 


quantities a, as before, and the only alteration necessary in our work is that (2-3) must be 
replaced by 


C = B’(varx)"B, d= B’(varx)-'x, (2-61) 
where var x is adiagonal matrix whose diagonal elements are given by the previous equation. 
Of course C and d are still functions of the a;, so that we are still entitled to the notation 
t(x), ete. 

The commonest use of this generalization may well be in problems where some of the 
x; may be assumed to have the same variance, or known multiples of a common variance; 
such a situation could arise in a two-factor lay-out (see § 4) in which the error variance only 
depended on the level of one factor, being independent of the other. 

An interesting special case of (2-60) occurs when there is only one 2; we may then write 
var x; = £;a, say, and the hypothesis reverts to the classical form in which there is only 
one variance, «, to be estimated, the quantities 871 being known ‘weights’. The statistic 


t’V-'t reduces to r times the quantity taken to be distributed as F with r and v degrees of 
freedom in the classical case, and 





2h(a) = “(14 2), 4x — 11(r— 2) XB + (r— 2) (7r— 10) 


= as +00). (2-62) 


This formula, then, is 4n expansion of r times the significance point of an F distribution with 
r and v degrees of freedom, in terms of the corresponding significance point of a x? distribu- 
tion with r degrees of freedom. The first two terms were given by Welch (1951). 


3. SIMPLE EXAMPLES OF THE UNIVARIATE LINEAR HYPOTHESIS 
The simplest example one can think of is that in which the structural relations (2-1) are 


simply E(x;)=0, (i =1,...,k), (3-1) 
and the null hypothesis to be tested is that 
ee (3-2) 
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This arises if the x; are the means of k samples, and the null hypothesis states that the 
corresponding population means have specified values. No generality is lost by taking these 
values to be zero (see footnote on p. 19). 


In this case the maximum-likelihood solution for the 0; under the alternative hypothesis 
is 


(=a, ¢21, (3-3) 
whence the variance matrix of t is 

V = diag (a, ...,a,). (3-4) 
Since t does not depend on the variances, we have an example of the simple case, to which 
(2-45) applies. Writing w,=1fa, (i =1,...,k) (3-5) 
we have V-1 = diag (w,, ..., w;), (3-6) 
V, = diag (0, ...,0, 1,0, ..., 0), (3-7) 

having unity in the ith place only, so that 
(¢|¢) = (¢)? = w3, (3-8) 
(¢|¢|¢) = (ila) (¢ = wi, (3-9) 
Gli j| A= oe iin’ (| 3)® = G19) @ (9) = Syw$, (3-10) 
(i | 4) (9 | 7) = | 4) al ©} Ay te (3-11) 


Thus the two quantities to be apn for test purposes are 


Vt = Lw,7? (3-12) 
and 2h(a) = x?+4(3x4+ x2) Zz? 


1 k—2 
+ | 2(3X—— Xe) Bry +76 (1- =) (3X4 + Xe)” 


— 2(3X%4— 2X2 — 1) (3X4 + Xe) + (9X8 — 3X6 — Bxa— xs) evry +O(v-%). (3-13) 


x* is based on k degrees of freedom and x, denotes y**/k(k + 2) ... (k+ 28 —2). 

It will be noticed that in this example 2h(a) does not actually depend on the a,; this is 
because Xw,x? itself has a distribution which is independent of the unknown variances; 
it is, in fact, the sum of k quantities distributed independently in F distributions with 
(1,v,),...,(1, ¥,) degrees of freedom. 

As a second example, suppose that we have 


E(x,) =p, (i=1,...,h), (3-14) 
and that the null hypothesis is fy = «+. = Mp (3-15) 


that is, that the means are all equal, their common value not being specified. This hypothesis, 
which only imposes (k— 1) constraints, is not in the canonical form, but is easily put into 
this form by means of the transformation 


My = 9,40, (6 =1,...,k-1), (3-16) 
p= Op (3-17) 
when it becomes 6,=... = A,_, = 0. (3-18) 


The maximum-likelihood solution for @ under the alternative hypothesis gives us 
t= {x —2,, sos) Cp_y— Uy}, (3-19) 
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whence a+, & Pad a, 
Qa, Agta .«.. Ay 
v= . 
a; a; coe Ap_y t+ Oy 
Ww, — w3/w —W,W,/wW  ... = — Wy Wy_,/W 
ys — W,W,/w We— Welw... = — Wy Wy_,/W 
TW yy [WO — Wy W/W... Wy — WEa/W_ 
k 
where w,=l1/a, w= Dw; 
1 
Thus V, = diag (0, ...,0, 1,0, ...,0) (¢ = 1,...,4—1), 
1 pee 
i tie. 
V. = ’ 
| 1 Bi atew fe 
and (i |i) = (@) = (w,—w3/w)?, 


(i i] §) = GJ) = @ = (w,—wyfwy, 
2 2 i 
(14151) = E161 DG = 61D) = (w.— ZF) (t,— 22) (8,500, 
w,w,;\* 


(5161) = (By. —"E2)’, YI 
* . . ° . . ° ° , pe at 
(14) (5] 9) = G4) GP = GPU)? = (».- 3) (-,-2) 


Ww 





(3-20) 


(These latter results hold for 7,7 = 1,...,4, but they are most easily proved when i, j +k, 
the results when i or j = k following on consideration of the fact that all the traces involved 
are left invariant by non-singular linear transformations of the ¢,; thus in spite of the 
special part played by 2; in (3-19), it could be replaced by any other x without altering the 


results (3-25)—(3-29), so that the latter cannot be restricted to the case i, j +k.) 
Since t does not depend on the variances, we again use (2-45), which gives 


»,\ 2 
2h(a) = x2+H3x4+ x2) 25 (1-Z!) 


v 


(73 ( _&-3 yy) 52 ( - 2)" 
+179 (1- Sa) | xe x) E> (1 * 


+ (3X4 + Xe) [8(2Ra, — 5 Ry + 4Ro, — 2R7, + 4R,, Ry, — 3RiQ) 
+ 8(X2— 1) (Re, — 2Rge + Rog — RE, + 2Ry, Rye — Rip) 
— (3X4 — 2X2— 1) (Rip — 4Ryo Ry + 2Ryo Rye + 4Rj, — 4R,, Rye + Ri,)] 
— (5X6 + 2X4 + X2) (Roo — 3Rq + 3Rye — Rog) 
+ 4(9Xg + 3X6 + 3X4 + Xz) (Rog — 4B; + 5 Rye — 2Rys + Ri, — 2Ry, Ry. + Rip) 
+ (3X5 + 3X6 + X4+ Xo) (Ray — 4Ro, + 6Rqe — 4,3 + Ri) 


+ 7¢(9X—— 3X6 — 5X4 — Xe) (Rip — 4Rp Ry + 2Ryy Ryo + 4R3, — 4Ry, Ryo + Ri)| +O(v-*), 


(3-30) 


wk 


t 





0) 
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1 /w,\! 
Ry ==>, (**) (3-31) 


Ww 


where 


x? is based on (k — 1) degrees of freedom, and y,, denotes 
x**/(k—1)(k+1)... (k+28—8). 


It is well known from large sample theory, although it may be proved directly from (3-19) 
d (3-21), th 
a aa lla sc t’'V-1t = Dw,23— (Zw, 2,)?/w (3-32) 
in this problem. The test carried out by comparing (3-32) with (3-20) is the same as that 


obtained in the previous paper (although the formula for 2h(a) is arranged rather more 
compactly here). 


4, APPLICATION TO A TWO-FACTOR EXPERIMENT WITH REPLICATION 


We suppose that we have a factorial experiment with two factors A and B, A being applied 
to p levels and B at q levels, and that at the ith level of A and the jth level of B we have 
v;z;+1 replicate determinations. We denote the individual results by 2; (i = 1,...,p; 
j =1,...,9q; 1 = 1,...,¥+1) and assume that 


E (Xin) = M+ at Bit+ Vij (4:1) 


We also assume that the deviations from these expectations are independently and normally 
distributed about zero, with variances which may differ from treatment combination to 
treatment combination, but which are constant for all v;;+ 1 observations with treatment 
combination (7,7). Let x;,; denote the mean of these v;;+ 1 observations and let a,; denote 
the estimate of its variance, based on v,; degrees of freedom, derived from the ‘within treat- 
ments’ sums of squares. That is, 


_ 2Xin — _ al®ig— is)? 


is : = Y  @=1,...,9;j=1,...,9). 4:3 

DD > Dive +1) (0 PJ = 1,.-59) (4°3) 
An overall test for the significance of all the treatment effects (main effects and interactions) 
is afforded by treating the x,; as k = pg sample means, as considered in the previous section. 


We now consider the more commonly required tests for various groups of constants. 


(a) Test for interaction 
The null hypothesis is that all the y,;; are zero, the «; and #; remaining unspecified; the 
alternative hypothesis is that none of the parameters are specified (always subject, of 
course, to the conditions (4-2)). 
In order to express this hypothesis in canonical form we write, in place of (4-1) and (4-2), 


E(x,3) =O; +O: +¥j;t+o (t=1,....p—1j)j=1,...,9—-]), (4-4) 
E(xiq) = d; +w (i=1,...,p—1), (4:5) 
E(xy;) = Yjto (j=1,...,¢-1), (4-6) 
E(2yq) = w, (4:7) 


there being no restrictions corresponding to (4-2). The null hypothesis is now that all the 
4,, vanish. 
7 
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The maximum-likelihood estimates of the 6;; are 


tig = Uiyz—Bjg—Myyt+Xpq (t= 1,....p—1; fj = 1,...,g—]), (4:8) 
and we have an example of the simple case. Hence the variance matrix of the vector t is 
yy + Ayqg + Api + Apq Ag + 4pq Ap1 + Aq Aq 
ain iq FApg Ugg t Iy2+Fyq Ang Ang + Ing 
Api t+ Ang Ang Uy +Aqgt+Ap1 + Ang Aog + Ang 
Ang Ane + Ang Aq FApg %+ Aq + Ape t+ pq 
(4-9) 


where, for simplicity, we have taken the case p = q = 3; the elements of t are arranged in 


dictionary order, thus: {t,), ty2, t;, ta9}. This is easily differentiated to give the first-order 
derivatives V,, = 0V/0a,;: 


V,, = diag (1,0, 0,0), V4, = diag (0, 1, 0, 0), 














V., = diag (0,0, 1,0), Ve, = diag (0, 0, 0, 1), (4:10) 
i. Olen ro 0 0 OF 
nh) COR ere. 2 eit 
000 0 @ ea) 
lo 00 0 PP: @) wad 
rl 0 1 4 ro 0 (OO 07 
eed teh made shod 3 ieee) 
old Se ee oe | oo le ee ee) fs 
10 0 0 0 0 10 1 
ee 
Vig = iat iis (4:13) 
pig oly ig 
he ee AD 


The author has not succeeded in finding a simple form for V-!, corresponding to (3-21) of 
the last section, but it can of course be calculated numerically in any given case (see also 
below), and we shall denote its (ij, kl)th element by v-*!, Then 


(ij | tj) = (ij)? = (v4)? (¢§ =1,...,.p—1;j = 1,...,qg—1), (4-14) 
(ig | iq) = (ig)? = (v'»*-)? (iG = 1,...,p—1), (4-15) 
(pj | pj) = (pj)? = (v7)? (F = 1,...,9-D), (4-16) 
(pq | pg) = (pq)? = (v-*)?, (4-17) 


where (ij | ij) denotes tr(V-'V,,)?, etc., and the dots denote summation over the indices 
omitted, from 1 up to (p—1) or from 1 up to (¢g—1) as the case may be. If we make the 
convention that an index p or q, as the case may be, is to have this same meaning of sum- 
mation, then (4-14)-(4-17) can be written as one formula: 


(ij | 9) = (tj)? = (v9)? (GG = 1,..., 93 J = 1, .0059). (4:18) 


a ee TL) 


— ee. 


) 
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Substituting into (2-45), 


2h(a) = x°+ (3X4 + Xe) Di z (a0) 


 +0(v-), (4-19) 
Vij 

the terms with 1 = p or j = q being interpreted according to the above convention. x? is 
based on r = (p—1)(q—1) degrees of freedom, and x,, denotes y**/(r+ 2)... (r+2s— 2). 
(The author has evaluated the next term in the expansion, but it is not thought to be of 
sufficient interest to give the rather complicated formula here.) 

The quantity t’V-t can either be calculated directly, once V-! has been calculated, or 
as the difference between the (weighted) sum of squares due to fitting all the constants and 
that due to fitting the main effects only. If we denote the partitioning of the various 
matrices involved according to their first r (=(p—1)(q—1)) and last k—r (=p+q-1) 
rows and/c. columns by the numbers | and 2, and in particular if 


ak % a al a ac eis 

C., Gy!’ cu C3)’ 

then this last method used the identity 

t’V-1t = 6:(C)-16, = 6°C6 — 6%’ C5163 
= 6’d—6?'d,, (4-21) 
where Qx is the maximum-likelihood solution for 6, under the null hypothesis 6, = 0. If 
we adopt this well-known method of calculating t’V—'t (and it is the more economical one 
unless p and g are quite small), then we can evaluate Cz! at the same time, and obtain V- 
from the identity V-! = (C%)-1 = G,,-C,,C5'Ce,, (4-22) 


rather than by the direct inversion of V. 


(b) Test for a main effect 


We now assume that the interaction constants are zero, so that (4-4)-(4:7) become, after 
a trivial change of notation, 




















E(x;;)=9,+0; (t= 1,....p—1; 9 = l,...,q)> (4:23) 
E(%py)= ww (f=1,..59)- (4:24) 
The null hypothesis is that the ¢; vanish. The matrices C, @ and d are (for the case p = g = 3) 
[wy + Wy Wy Wg Ps | Pw, 2.) 
We, | We, Weg Weg Pe We Xe, 
C= , 6=] — , d=) cal gs (4-25) 
Wy Wy : Wy WO, W421 
Wing Wopgi - We .- Ws W 2% ¢ 
LWig Wag W.¢4 Lo LY, q 7. 
a p 
where Wy = lay, Wi. = VW, Wy= DL Wis (4-26) 
j=1 i=1 
a mn p 
and Ww; Z, = LY Wy Xz, WT p= LD Wy Ty- (4-27) 
j= i=1 
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Since the solution t is obviously a function of the a,;, this linear hypothesis is an example 
of the general case, and we must not be surprised if the application of the formula (2-59) 
gives a rather complicated test, even as far as the term of order v-. 

From (4-25), 











Pw ; Wy, Wy W,| Tw ‘ ot TA fae gg. 
Awv7al| be im 12 Wig 1 nu Va 
- WW We, Woe Woy W 2 Wy. Wee 
L W.4 Wig Wag 
[wy —Lw3/w, —LDwyWy/w ,| 
_ |“. U/W .1 u 4 a (4-28) 
Ly yw, W,,— Lw}/w | 
q 
or Apa =U" = 8g, — Y Wy Wy/W (4-29) 
It readily follows that the typical elements of the first and second derivatives, A,; and 
A;;, 43, are - ; 
“0? OA, 2 OA, i ( WwW. ) ( W.; 
a = -— Wx = — wi, | 6,,— —7} (6; -34), (4°30) 
0a;; ij Ow;; j\ "Gr w.; is W.; 
2 7 f 4 i. 
— a 2ui, (1-2) (3,,-22) (3,.-"2); (4-31) 
a, W 5 Ww W 5 


these hold for i = 1,...,p; j = 1,...,q; 7,8 = 1,...,9—1. No explicit formula is known for 
A-! = V, but it can of course be evaluated numerically. We now find 


trAA,, = —w,;u;;, (4-32) 
W4 
tr AA; 4; = 24, ( ia) Uiz, (4°33) 
tr (A“A,,)* = (trA“A,,)? = wi,u3,, (4-34) 
pl W.. W.; 
where Uz = Wy DY tm( Bie ) (3..— =) ° (4°35) 
gent \ W 5) W 5 


Finally, we have to evaluate tr A(C-1C,,C-1C,,C-1),,. Again, no explicit formula is 
known for C-!, and we shall denote it by 
fo? Pict oc8 ct] 


c2l ¢22 ; ctl (2 64 


C-1 = 4-36 
cc? : ch c? cl ( ) 
oe fF ae A 


’ 
ef fiat a dat 








the asterisks meaning that the corresponding indices refer to the rows or columns belonging 
to the parameters w;. Some calculation now gives (for the case p = g = 3) 

Pe ME ol ee A (ett 4 Opti ij cM +el li 1j 2 23 

( ij tj )ua oe wi;(c + 2c at cit) cri +¢% [c +C*s Cc +74] 


G=1,...,.p—I;j =1,...,9), (4:37) 
fol a ; 
(C1C,,C-1C,,,C-1),, = wh cli ‘a (c% cX] (fj = 1,...,9). (4°38) 
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If we agree to interpret the c symbols not occurring in (4-36) as zeros, (4-37) is also true for 
i = p. Combining this result with (4-29) we have 


li lj 
tr A(C7C,,C-1C,,C-1),, = wh (ct + 2ctt + cil) [cl + cl os otal cal 
=wyU, (¢=1,....p;j =1,...,9) (4-39) 
sd stents q w.w : : ; ; 
where U,; = wi, (c* + 2ct+ell) > (2,,2,.— > “ta (crt + ct) (c** + c%), (4-40) 
r,s=1 l=1 el 


c’s with no other meaning being interpreted as zero, as before. 
Thus substituting (4-32), (4-33), (4:34), and (4-39) into (2-59) we obtain 
2h(a) = x°+E 5 [2xa(2 —3 wu) Wis + (3X4 — 11X2) wi; + 2Xe Us| +O(v-*), (4-41) 
t J 
where x? is based on (p— 1) degrees of freedom, x,, denotes y**/(p —1)(p+1)...(p+2s8—3), 
and u,;, U;,; are given by (4-35) and (4-40) respectively. (Since V= C1, the former equation 
may be written in terms of the c’*, like the latter, if desired.) 

Again there are two ways of calculating t’V—'t: either directly, or as a difference between 
two sums of squares. In either case the greater part of the labour lies in computing the 
solution 6, or at any rate the part 6, =t. V-'is readily found from (4-28)-(4-29), and V by 
inversion. 

It should be pointed out that the occasions on which the results of this section may be 
used as they stand are likely to be few and far between. Second-order terms are either 
unknown or unquoted, and would in any case take a considerable time to compute. Conse- 
quently with only two or three replications 2h(a) will not be well determined. The eventual 
practical solution to this difficulty may lie in assuming that the variances are given by 
some such law as «;; = a; +a; or a; = a,a;, and then estimating the constants «;, aj, 
which are fewer in number than the «,;. Put in another way, we could use our present 
estimates a,; to provide ‘smoothed’ estimates a}, obeying such a law, thereby gaining (in 
an intuitive sense) degrees of freedom in our smoothed estimates. We would pay for this 
increase in that the distributions of the a}, would be complicated and non-independent. 
This situation may not be quite as hopeless as it appears at first sight, for it may be shown 
that (reverting to the notation of § 2) whatever the joint distribution of the a; is, provided 


oe E(a,) = a, + 0(r), (4-42) 
cov (a;,a;) = O(v-), (4-43) 

and the higher cumulants are o(v—!), we have 
© = 1+420;0; cov (a,,a;)+0(v~), (4-44) 


which reduces to (2:12) when the a, have independent x?-type distributions. Thus it may be 
practicable to set up a first-order theory for this and other more complicated situations. 


5. A MULTIVARIATE LINEAR HYPOTHESIS 
Suppose that x,,...,X,, are m p-variate vectors whose expected values are known linear 
functions of k unknown parameters 6,, ...,4,,, and that they are independently distributed 
around these expected values in p-variate normal distributions whose variance matrices 
are a@,,...,@,. In matrix notation we again have ‘ 
&(x) = B®, (5-1) 


Biometrika 41 3 
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but x now denotes the column vector {x,, ...,X,}, and B, the matrix of known constants, 
is now of size np xk. Let A,,...,A, be unbiased estimates of a,,...,a,, distributed inde- 
pendently of the x; and of each other in Wishart forms with 1, ..., v,, degrees of freedom. 
The null hypothesis to be tested is, as before, that 0, = ... = 0, = 0, where r is some 
number between | and k. If we knew the a; the appropriate test would again be based on 


the equation Pr[t’V-t(z) < 2] = G,,(é), (5-2) 


the _— ae the same meanings as in §2, except that a in (2-3) now denotes 


diag (a,,...,a,). As we do not know the a,, we attempt to find a function h(a) of the 
elements a. of the matrices A;, which is such that 

Pr [t’V-t(a) < 2h(a)] = G@,(é). (5-3) 
Now Pr [t’V-t(a) < 2h(a)] = [re [t’V-1t(a) < 2h(a) | da] Pr [da], (5-4) 


where the first expression on the right denotes the conditional probability of the relation 
indicated for fixed values of the a,,,, and the second denotes the product of the probability 
elements of the n Wishart distributions of the a,,,. Now 


irs* 


Pr[t' Va) <2h(a) |a] = exp|E En ~ ti) | Pr[t'Vt(a) < 2h(2)) 


Xing 


= exp[2 tr(A;—a,)0,] Pr[t’V—t(a) < 2h(a)], (5-5) 
where 9, denotes the matrix of derivative operators: 

O/Oe, «+. 40/0, 
et meee ee Rie P (5-6) 

$0/0a;51 0/0» 
its typical element being Oing = $(14+4,,) 0/0045. (5-7) 
Thus (5-3) becomes © Pr [t’V-'t(a) < 2h(a)] = G,(&), (5-8) 
where © = jexp[2 tr(A,;—a,) 8,] Pr [da]. (5-9) 


We now see that © is exp [ — 2 tr a,9,] times the product of the moment-generating functions 
of the n Wishart distributions (the ‘variables’ being 0 


irs) OF 


—b; 








© = exp[-Etra,d,]1|I-= 4,3, (5-10) 
i 
To proceed, we require to prove the theorem enunciated in § 2, namely, 
—log|I-Y]| = tr¥Y+4trY?+4tr Y*+.. (5-11) 


(the series being convergent, in the case when Y is a numerical matrix, when all the eigen- 
values of Y are numerically less than unity). Suppose that the npg of Y areA,,...,A 


It can be shown that the eigenvalues of Y’ are Aj, ...,A%,, for r = 1, 2,.... Thus 


| wl —Y"| = (w—Aj)...(w—AE) (r = 1,,...). (5-12) 
Therefore tr Y" = Z2j. (5-13) 


‘p* 
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Also, using (5-12) with r = 1, w = 1, 
—log|1-Y| = —Zlog(1—A,) 
= DA, + E+ Plt 
= trY+4trY?+4trY*+..., (5-14) 


by (5-13), which proves the theorem. 
Applying this result to (5-10) we obtain 


© = exp =| — te a,;0;— $v, log 





2 
Vi 








V; 3. 


Py 2 . . 2 2 
me 14 EO +{s heh +3 (2 82e) |+00-. (5-15) 


a Vi 


. * 2 . s 3 
expz| 750 , 4 tr (a,0,) +. 





The analogy with Welch’s © operator is obvious. It should be noted that the operators 
0; do not act on the a; present in @ itself, and indeed it is more useful to write (5-15) in 
suffix form: 

ae = 1+ > > Ciur List Dire O Citu 


i retu V; 


ed 








+ {gz ip ist %inw Cire Pitu Pino PS Kirep Liigy Xjyy O Jwx Pein) «Oly —3), (5-16) 


v2 ViV; 
Proceeding from (5-8) and (5-16) as in the univariate case we obtain, for the equations giving 


h,(a) and h(a), 
E (a) D+Z= “whoa Pr [t’V—t(a) < 2¢] = 0, (5-17) 


V; 


ec )D+ thi(a a) D?+=>- “wri ist (Airs, tw(¢) D+ 2hGr9)( ) a; D 


itu 
i 


+ hy(&) Oip¢ 9 gD) + $2 Xie Vist Ciww Pine ites Pero 











irs“ itu py? 
re 3=z Line List Xjyy “te Oitu Osow “se Pr r[t’ V-t(a (a)< < 2¢] = (5-18) 
where hic a) ans Cire hy (a )= 3(1 +6,.) Ohy(&)/Ocx 5,55 etc. (5-19) 
To evaluate the derivatives we consider 

J = Pr[t’V-t(a+e) < 2], (5-20) 

where the increments €,,...,€,, tO @,,...,@,, are symmetric matrices. As in equation (2-17) 
this is equivalent to 

JT = [1+ DLE ig Ving + $LLE jog jen, Sips Oty + ---] Pr [t’V-1t(x) < 26], (5-21) 

the sums being over i, j,... =1,...,; r,8,... = 1,...,p. But we can also express J in the 


form (2-18), and carrying out similar transformations and expansions on this to the ones used 
in § 2 we find, in the simple case when t(«) is not actually a function of the @;,,, 


J = (1+ 2%e,,,{ — }(irs) A} 
+ TLE jpsEjn{ (irs | jtu) ($A + ZA?) + Hirs) (jtu) A*}+...]@,(£), (5-22) 
3-2 
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where (irs) = tr V-V,,,, (irs, jtu) = tr V2" Ving jn» (is | jtu) = tr V1V;,g VV, --- (5°23) 
and Virs = Ving V = $(1 +855) OV [Oeti5, Vins, ju = Fire Oty V; -- (5-24) 
Thus we find Oirs Pr [t'V—*t(xx) < 2] = $(irs) Hg ,(&), (5-25) 
Oirs Ojty Ev [t’'V-t(x) < 2€] = [—4(irs | jtu) (H? + £) — }(irs) (jtu) (H?—E)]g,(€), (5°26) ’ 
and similar modifications of (2-41) and (2-42). The rest of the work is similar to that in § 2, ' 
and the final result is 
2h(a) = x° +4 ~ > eae aa [2(X%4+ Xe) (drs | dtw) + (X4— Xe) (irs) (itu)] 
1/, r-2 5 Viup Vigh a ewee . | 
+ Fr (1 ” =) b a . Y; [2(X4 + Xz) (irs | itu) + (X4— Xz) (irs) ctw) ; 


-is > te 2(X4 + Xe) (dur | ist) + (irs | itw)) + (X4—_Xzq) ((iur) (ist) + (irs) (itu))] 


4 rstu vy? 





+ yr Vir Vist Vir ° oa ; . ; ; 
+4% 2 aad eae [2(X%4 + Xe) (tr | itu | cow) + (X4— Xe) (irs | itu) (ivw)] 
a fl v 
a 2 Vir Vist Mie pTIwex 2 at Xe) (2(irs | itu | jow | jry) + (irs | jow | itu | jay)) 
ij rstu V; V; 
atid at X2) (2(irs | itu | jouw) (jay) + (irs | jow) (itu | jry))] 
Cae eats. pep oe Mae ga” 
+(X2—-1)% ~ a [2(X4 + Xz) (irs | tte) (tvw) + (Xq — Xe) (i78) (itu) (tvw)] 
“oe ? 
D jy hy ds P ae «7 : 
-—(x%-1l=z = Fur Vist Vive PSF 12(x4+ Xz) (irs | itu | jow) (jay) 
7] retu Yi 7 . . ° . 
yy + (Xa Xz) (irs | juw) (itu) (jxy)] 
Diep Vjyy By a ee 
z=. 4 ye > Diur' List € “dye *jwx [2(x4— 1) (irs | itu) +(x4- 2y. ap 1) (irs) (itu) ] 
yj rstu V; V; 4 r 
vwry 


x [2(X%4 + Xe) (Gow | jay) + (Xa— Xa) (Jvw) (Jry)] 


-4>>- Sexo nia Tiue [8(X6 + Xa + Xe) (irs | itu | tow) + 6(x5— Xp) (irs | itu) (ivw) 


i rst 
we + (Xe— 2X4 + Xa) (irs) (itu) (ivw)] 
a 
+2 ¥ ciur “ist “jue “jwe 
ij retu Viv; 
vwry 


x [16(X%3 + Xe + Xa t+ Xe) (2(irs | itu | jow | jay) + (irs | jow | itu | jay)) 
+ 32(xs— Xe) (irs | itu | jow) (jxy) 
+ 4(X3+ Xe — Xa— Xz) (irs | itu) (jow | jay) + 2(irs | jow) (itu | jay)) 
+ 4(X3— Xe— Xa + Xz) ((tr8 | itu) (jow) (jay) + 2irs | juw) (itu) (jry)) 
+ (Xs — 3X6 + 3X4 — Xe) (278) (ttw) (juw) (jry)]} + O(v-*). (5°27) 


Here x,, denotes . °8/r(r+2)...(r+28s—2) while (irs) now denotes tr V—1(a) V;,,(a), where 
Virg(@) = $(1+6,,) OV(a)/0a;,,, and so on. 
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In the general case, where t(a) is actually a function of the « 


irs ® Very similiar analysis 
to that given in § 2 finally yields 


2h(a) = x2+(p+1) xO D8 tr Ag 


t's "% 
+z p » a [3X2 trA- As, itu +(X4 ry 5X2) tr A- , oS Ninn 
i rstu i 
+ 4X4 xX) (tr A“A,,,) (tt A7A,,,,) + 2x tr A(C-G,,, C1, C-),,] + Ov), (5°28) 


the sisi! of (2-59). (A denotes V-1.) 

An immediate generalization of the above results is to the case when the vector x; contains 
p; variates, the p; not being necessarily equal; apart from altering the (implicit) limits on 
summation signs, the work remains virtually unchanged. 

Another generalization is analogous to that given at the end of § 2; in place of var x; = a; 
we may write 

var x; = S E Basa j> COV(X;,X;)=90 (¢+)). (5-29) 
j= 


Botk. generalizations may be covered by writing 


cov (Xi, is) om: >> p> 1 Birs Xitus cov (x ‘ir? Lis) = = 0 (0 +j). (5-30) 


6. THE MULTIVARIATE ANALOGUE OF THE BEHRENS-FISHER PROBLEM 

Suppose that we have samples from two p-variate normal populations, whose true centroids 
are ¥%,, %, and whose variance matrices are unknown and not assumed to be equal. Let 
X,, X_ denote the centroids of the samples, and let the sampling variance matrices of these 
centroids be a, = (&1,,), @ = (&,,), While A, = (a,,,), A, = (@,,) are the usual unbiased 
estimates of a,, a,, distributed in Wishart’s forms with v,, v, degrees of freedom respectively. 
We wish to test the hypothesis y, = p,; this becomes 6 = 0 if we write 6 = p,—p,. The 
maximum-likelihood estimate of 6 is t = x,—X,, which is independent of the unknown 
variance matrices, and its sampling variance matrix is V(«) = a,+a,. Thus if a, and a, 
were known we could test our hypothesis by comparing t’V—1(«) t with a significance point 
of x? on p degrees of freedom, but as they are not we compare t’V-1(a) t=t’V—'t with 2h(a) 
given by (5-27). 

Now V,,, is a matrix with zeros everywhere except for }4’s in the places (r,s) and (s, 7) 
(or 1 in the place (7,7) if r = s). Thus 

(irs) = tr V-!V,,,, = ¥ v4 4(6,4555 + Sea 5,5) 


ab 
= $(v% +0") = vo”, (61) 
(irs | jtu) ag >> via 4(9 Oe +85q9,») uve B(Die bua + Suc Fra) 


abcd 
on (oer f yusyrt us yryst + v/syrt) 
ae 4(v¥rys + yusyrt) (6-2) 


(The final reduction to half the number of terms is a peculiarity not shared by traces 
of higher order; apart from this it will be seen that the totality of the terms are obtained 
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from the first by interchanging the letter pairs r,s; t,u; ... occurring together on the left- 
hand sides.) In this fashion we find 


LY Fur Vg(irs | tu) = $([¢ | 7] +[2]*), (6-3) 
eS (itu) = [i | i], (6-4) 
where [i] = trV1A ha | il = trV-1A,V-1A,,..., and V=ZA, (6-5) 
Hence 2h(a) = x*+4E— -(2xali | H+ (Xa+ x) [6) +O). (6-6) 


Here y? is based on p degrees of RvProeay and x,, denotes x**/p(p+ 2)... (p+2s—2). The 
term of order v~*, obtained from further formulae like (6-3) and (6-4), over which a good 
deal of care is necessary, is 


= 2 
2hy(a) = 75 (1— Ps) [E> 2xall |] + Cet xo | 


— D7 *((2X4 + Xo) [4 | 1] + xalt}?) 
+ Lvz*(2(3X4 + Xe) [4 | + | 7] + (5X4 + Xe) [4 | 2] 2] + (Hq +22) [7F) 
— 2(, Vj) (2xalt | | J | J1+ (2Xa+ Xe) (0 | Ft | J+ (3a + Xe) [e | | JL] 
+ Xalt | JP + (Xa + Xe) [| 71 (1) 
+ (X2— 1) Dz ?(2yQft | t| 4) + (a+ Xa) [| 1] [4] 
— (X_— 1) X(v,v;)* (2X, [t t{¢| 5 | j)+(xa+Xe) [¢|¢| I) 
— $2( ti )~* (2(X4— Xa) [é | 4] + (Xa — 1) (EP?) (2xaL9 | 51+ (Xa t+ X2) VP) 
— Flv7*(2(4%6 + X4+ Xe) [i | | 1] + 3(2X%6 + Xa) [4 | 4] 2] + (Xe + Xa + Xa) [41) 
+ FeX(V; oe (32xpLt | ¢| 7 | J] + 8(2Xe+ 2X6 + Xat Xe) [é| J] *] J] 
+ 16(2X5+ Xe + Xa) [4 | #| [9] + 4(Xs— Xo) [¢ | 4) | 3] 
+ 8(X3+ Xo) [¢ | J]? +4(Xs— Xa) [2 | IP 
+ 8(X3 + Xe +Xat Xe) [| J1[] LI) + (Xe + Xo — Xa — Xe) [4]? (37?). (6-7) 
This result has been checked by treating the problem on its own merits, and not as an 
example of the general linear hypothesis; however this method seems to the author to 


be at least as confusing as the one adopted here. 
In the univariate case the result reduces to 














> La3/v? 
2h(a) = x¢[1+402 +1) tt {OF Dee ea (x4 + 5y2 +e 
Da?/y,)2 
— Hat + 15y2+ 7) SPP 500-9), (6-8) 


which is equivalent to Welch’s (1947 a) result. 

Of course there is nothing to stop us considering, with Welch, the slightly more general 
problem where y is a vector having a p-variate normal distribution with centroid n and 
variance matrixt V(«) = a,+...+a,, while A,,...,A, are distributed independently in 
Wishart distributions with expectations a,,...,a, and degrees of freedom 1,,...,v,. The 
test for the hypothesis n = 0 (say) is then to compare y’V-(a) y with 2h(a) given by (6-6) 
and (6-7), the sums therein now running from 1 to k instead of from 1 to 2. 


ft Or V(a«) = XA,a,; the inclusion or omission of constant multipliers is a trivial matter since they 
can always be absorbed into the a, and A,. 


tl 
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7. A TEST FOR THE EQUALITY OF THE CENTROIDS OF SEVERAL 
MULTIVARIATE POPULATIONS 


We now extend the work of the preceding section to the case of samples from k populations. 
Using the same notation as before, we write 


&(x;)=9,+6, (¢=1,...,k-1), (7-1) 
6(X,) = 6,, (7-2) 
the x; being the sample centroids, and the hypothesis to be tested being that 
6, =... = 6,_, = 0. 
The maximum-likelihood solution gives us 
t,=x,-x, (¢=1,...,k-1). (7-3) 
Denoting {t,,..., t,_,} by t, its estimated variance matrix is 
A, 4+ A, A, eve A, 
A A,+A, ... A 
V- k at thy k (7-4) 
A, A, --» Ayy +A, 
and W,-W,W-'W, -—W,W-(W, ... —W,W-"W,,_, 
me, He W.W"'W, W.-—W.W"W., .... —W,W-'W,_; (75) 
—-W,,W'W, —-W,,W'W, ... W,,—W,i1W"W,_, 


where the A; are the estimates of a; = var x; based on v, degrees of freedom and where the 
‘weight matrices’ W, and W are defined by 


k 
W,=A;', W=sW,. (7-6) 

i 
As for the derivatives, if i+ k we see that V,,, is a matrix with zeros everywhere except for 


}’s in the places (ir,is) and (is,ir), or, if r = s, except for unity in the place (ir, ir); on the 
other hand, V,,,, has all its elements zero except for }’s in the places (ir, js) and (is, jr), for 


all i,j = 1,...,k—1, or, if r = s, except for units in all the places (ir, jr). Thus (compare 
equations (6-1) and (6-2)) (irs) = visir, (7-7) 
(irs | jtu) = }(v™ ty I + piu, teyir, it) (7-8) 


where v*-48 denotes the (ir, js)th element of V-'. Strictly speaking these formulae only 

apply when #, j +k, but if (see (7-5)) we interpret v”’* as the (r, s)th element of 
(3,;W;-W,WW,), 

even when i or j is equal to k, they can be shown to hold universally (by the same kind of 

appeal to the invariance of traces under linear transformations of the t’s as was made in the 

univariate case). Hence, substituting these results into the simple case formula (5-27) 

we obtain 


2h(a) = x? re 4 bs y¥ — ((X4 + Xe) (viv irqyis, it + yu isyir, it) + (X4 = Xe) piss iryiu, it) a O(v-*) 
i rstu i ° 


— (2x, tr (1— W-"'W,)? + (x4 + Xe) [tr (I— W-*W,)}*) + O(v-*), (7-9) 


1 
i 
V; 
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where x” is based on r = p(k — 1) degrees of freedom and x,, denotes x**/r(r + 2) ... (r+ 2s — 2). 
The term of order v~* is given by the same formula (6-7) that applies in the problem of two 
centroids, provided that we reinterpret the trace symbols as follows: 


and so on. These reduce to the former expressions tr V-1A;V—-'A, etc., in the case k = 2. 
The full set of formulae necessary to interpret (6-7) is 


[i] = tr (I-W-"W,), (7-11) 
(¢ |i) = tr I-W-W,)?, (7-12) 
[i | j] = tr (6,,1-W-W,) (6,,1-W-W,), (7-13) 
[¢ |i] i] = tr(I—-W-W,), (7-14) 
[i || j] = tr (I—-W-W,) (6,,1-W-W,) (8,,1-W-W,), (7-15) 


[|i tj | J] = & I-W>W,)(8,1-W>W,) (I-W—W,) (6,,1-W2W,), (7-16) 
(¢| J) ¢| J] = te [(6,,1 -W-W,) (6,,1-W-W,). (7-17) 


The result could obviously be written out explicitly in the form of the equation (3-30), 
but as the expression would be lengthier, and involve a more extensive set of R symbols, 
it is not proposed to do so here. 
The quantity which has to be compared with 2h(a) is t'V-!t, which, from equations (7:3) 
and (7°5), is given by t’'V-t = Ix{W,x;— =x;W,;W-'W, x, 
= ¥(x,-- x)’ W,(x,—X), (7:18) 
provided that we define x = W-1IW,x;. (7-19) 


This is the analogue of the weighted sum of squares Lw;x?—(Xw;x;)?/w = Lw,(x;—%)? 
occurring in the corresponding univariate problem. On large-sample theory it would be 
compared with x? based on r = p(k—1) degrees of freedom, and equations (7-9), and (6-7) 
interpreted with the help of (7-11)-(7-17) give further approximations. 


8. NUMERICAL ILLUSTRATIONS OF MULTIVARIATE TESTS 


Suppose that we have three samples, from bivariate normal populations, of sizes 16, 11, 11, 
and that their means or centroids are 


9-82 
x, = , x= 
. 15-06 2. 


u 





3-05 
2-57)? ™*3 


0 
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| ace ae 
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of 
— 
“I +] 
ee 
oo 
ie 
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e 
2 
and their variance matrices (using divisors vy, = 15, vz = 10, v, = 10) are 


120-0 —16-3 81:8 32-1 100-3 23-2 8-2) 
— 16-3 17°8|’ |32:1 53-8]’ 23-2 97-1)" le 
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The estimated variance matrices of the means x;, obtained by dividing (8-2) by the 
sample sizes, are 
7-500 —1-019 7-436 2-918 9-118 2-109 
A = = = ‘ “ 
, 4 1-019 war As mc rime ‘ to sha (8°3) 
Purely as a convenience, we shall utilize all three samples to illustrate the test of §7, 


and the first two samples to illustrate that of §6, but we shall not be concerning ourselves 
with questions arising from the non-independence of the two tests. 


(a) Test for two centroids 
In the notation of § 6 the difference between the two sample means is 


— 3-23 

= — xX. = 8-4 

and its estimated variance matrix is 

14-936 1-899 
= - => > 8-5 
ven 1-899 ee (85) 
0-06976 —0-02207] [ —3-23 

'V-lt — [—3-23 —7-51 = 9-45, 8- 
Thus {Vt =[—328 na eee chee oat] 9-45 (86) 


On large-sample theory this would merely be compared with the appropriate significance 
point of x? with two degrees of freedom (corresponding to the two hypothetical restrictions, 
My, = Hye 1 = Meg). A selection of these significance points is given in the second column of 
Table 1, from which it will be seen that on this test the centroids would be judged to be 
significantly different at the 1 % level. However, the large-sample test does not take into 
account the fact that A, and A, are only estimates of the true variance matrices a, and a,, 
and so it tends to exaggerate the significance of the difference. The use of the corrected form 
(6-6) will at least give some idea of the effect of this ‘Studentization’; Welch’s experience 
with the univariate problem suggests that the first corrective term may account for most of 
the ‘Student’ correction, but in any case its use will at least guard us against drawing over- 
enthusiastic conclusions from the large sample test. 


























Table 1 
Signifi > 
level (%) x A+Bx? 2h(a) 
5 5-991 1-207 7-23 
24 7-378 1-244 9-18 
1 9-210 1-292 11-90 
L 
Now (6-6) may be written in the form 
2h(a) = x7(A + By?), (8-7) 
1 (tr V-1A,)? 
where A=l+=— yg = SE i) ; (8-8) 
2p v; 
1 -1A4)2 -1A_.)2 
Se » (VA, +42 (tr V" Ay’). (8-9) 
P(p +2) Y; Y; 
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Taking V~ from (8-6) we find 


0-5457 —0-0956 0-4543 00-0956 

4 = ; —1 = : 8-10 
i Looms nce pe bodel sana sie 
tr (V-1A,)? = 0-4097, tr(V-1A,)? = 0-8873;+ (8-11) 
(tr V-4A,)? = 0-5794, (tr V-1A,)? = 1-5346; (8-12) 

-1 2 -1A .)2 
peak A Aiea = 0-11604, = (52d. 0-19209; (8-13) 

Vj i 

whence A = 1+}(0-19209) = 1-0480, (8-14) 
B = }[0-11604 + 4(0-19209)] = 0-02651. (8-15) 


The values of the multiplier A + By? and of 2h(a) (to the order v~') are given in Table 1. 
On this basis our figure of 9-45 (equation (8-6)) can no longer be regarded as significant at 
the 1 % level, but it is so at the 24 % level. 

An approximate 95% confidence region for the true centroid difference, 6=,— pe, 
is given by (t—8)’ V(t —8) < 2h(a), (8-16) 
ie.  0-06976(6, + 3-23)? — 0-04414(d, + 3-23) (d, + 7°51) + 0°17357(3, + 7-51)? < 7-23, (8-17) 


(6) Test for three centroids 
In the notation of §7 we have 


W. = Act — [01523 0-1396 _[ 01756 —0-1048) _f 01161 reeset 
1 ~1T “~ [0-1396 1-0272|’ 2" 1-0-1048  0-2670]’ 3 |—0-0277  0-1199]’ 
(8:18) 

0-4440 0-0071 2-2524 —0-0113 
W = =W, = , Wos= 8-19 
mi june eal pees Ol ae 

3-5980 —0-0738 ' 1-0060 
= oto = . are = = 8-20 
he fs omni WaX%s pare WsX%s Saal Pan 

i 45302] _ 9-9314 

=W,x; = = - x= W-IW,x, = 8-21 
ate ee Rept: Die Beers thi 
Thus =(x;— xX)’ W,(x; —X) = &x;(W;x,) —X’(Wx) = 18-75. (8°22 


If we were to use the large-sample test, this would be compared with significance points of 
x? on four degrees of freedom (corresponding to the four hypothetical constraints 
My = Pye = 13; M21 = eg = gg) 28 given in the second column of Table 2. On this (incorrect) 
basis the hypothesis would be regarded as disproved at the 0-1 % level. 

















Table 2 
—— coast an eeratans caetotapengtinicindicegtentaaeeia dilantin cemebeeaaias 
Significance 
a (%) D A+ By? 2h(a) 
5 9-488 1-283 12-17 
1 13°277 1-369 18-18 
0-1 18-467 1-488 27-48 








t Note that tr Y? = Dyjyjyje = Leyte + 2p < sys Ysee 
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The effect of ‘Studentization’ may be accounted for, to the order y-1, by the use of (7-9), 
which may be written 





2h(a) = x?(A + By?), (8-23) 
where Anay 5 pak ee (8-24) 


v 








+42 


t t 


[tr (I- WWF 
V. 


a 


2 1 [= tr (I—W-'!W,)? 
r(r+ 2) 


and r = p(k— 1) is the number of degrees of freedom of y?. Now 








0-6585 —0-3028 [0-6033 0-2391 
oud —1 = —_ —1 = 
hii Ree ion 1—W™Ws = | 0.0761 Hen 
0-7382 0-0637 
_W-'!w. = : : 
whe ae tee pera on” 
tr (I—W-'W,)? = 0-5680, tr (I— WW, = 1-0565, 
tr (I—W-W,)? = 1-3846; (8:27) 
[tr (I—-W-W,)}? = 0-8716, [tr(I—W-W,)]? = 1-9974, 
[tr (I— W-1W,)}? = 2-7327; (8-28) 
_Ww-Iw.)2 _w-iw.)? 
5 el- Wi)" _ o.2g20, pit WWF _ 9.5311. (8-29) 
i i 
whence A = 1+}(0-5311) = 1-0664, 
B = 3,(0-2820 + }(0-5311)] = 0-02281. (8-30) 


The values of the multiplier A + By? and of 2h(a) (to the order v-') are given in Table 2. 
Our figure of 18-75 (equation (8-22)) is certainly not significant at the 0-1 % level; it may be 
so at the 1 % level, but such results should not of course be interpreted too literally, par- 
ticularly when working far out in the tail of the distribution. 


The author’s thanks are again due to Mr A. M. Walker, who originally suggested, and 
first worked on, the problem of comparing two multivariate centroids; a discrepancy in 
our results led to the consideration of the multivariate linear hypothesis of § 5 as a means of 
checking this particular result. Mr Walker also checked the original draft of this paper, and 
pointed out a major error. However, the results have since been much extended, and the 
paper rewritten, so that any remaining mistakes are solely the author’s responsibility. 
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THE USE OF THE HANKEL TRANSFORM IN STATISTICS 
I. GENERAL THEORY AND EXAMPLES 


By R. D. LORD 
Royal Technical College, Glasgow 


1. INTRODUCTION AND SUMMARY 


The use of the characteristic function in problems of the addition of independent random 
vectors is well established. This paper is concerned with the particular forms taken when the 
vectors have spherical distributions in s dimensions, i.e. when all directions are equally 
probable and the distribution of magnitudes is independent of direction. That the charac- 
teristic function is then a Hankel transform follows by changing to polar co-ordinates in 
the usual Cartesian form. A ch~nge to polars has sometimes been used in particular problems 
of this class but usually in a inanner which does not show that the Hankel transform is the 
natural tool and that there is a theory for spherical distributions which runs parallel to the 
theory of the addition of random variables in one dimension. Soper’s little known tract 
(Soper, 1922*) is an exception to this statement, for on pp. 38-47 he sketched a general 
procedure. However, Soper’s work is rather inaccessible, very brief and uses a terminology 
different from that which has since become standard; he also uses the less convenient 
moment-generating function. A new account should therefore be useful, especially as the 
value of the Hankel transform in certain problems of mathematical physics is now known 
(Sneddon, 1951; Tranter, 1951). 

The form of the characteristic function is obtained in §2. In §3 it is shown that it can also 
be obtained by the method used by Kluyver to solve Pearson’s original random-walk 
problem. Another general procedure would be to use the appropriate form of the convolution. 
Although this method is not touched on here, it may be remarked that many of the integrals 
which arise in particular problems are evaluated in Watson (1944) by a disguised form of 
the method. The characteristic function is also a moment-generating function and §4 deals 
with this aspect. Since moments of order greater than two are not additive, cumulants are 
introduced as in the standard one-dimensional case. 

Spherical distributions are often derived by projection from other spherical distributions 
in higher dimensional space. In §5 it is shown that distributions so related have the same 
characteristic function, and that solutions of s-dimensional problems can be derived quite 
simply from solutions in one or two dimensions. In §§6-9 are considered some special 
distributio’ - and examples: normal, Cauchy and exponential. The random-walk problem 
of Pearson is left to another paper. 

The special functions of mathematics have usually arisen first in potential problems, and 
arguments about potential are often used to provide physical pictures of analytical results. 
Our results can be used to provide similar pictures in terms of probability. Further, we 


* I owe my acquaintance with it to Prof. Aitken, who told me of Soper’s little book and kindly lent 
me his copy when I gave a paper at a meeting of the Edinburgh Mathematical Society on a solution 
of Pearson’s problem using Hankel transforms (Lord, 1948). The main results of the present paper 
have been known to me since early in 1948; some of them were obtained independently by Dr N. B. 
Slater. 
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obtain integrals as answers to our problems in probability; reversing the viewpoint, a 
probability experiment may be a practical method of evaluating these integrals, i.e. the 
results given here suggest possible ‘Monte Carlo’ methods of evaluating integrals involving 
Bessel functions. 


2. THE CHARACTERISTIC FUNCTION AND INVERSION THEOREMS 
2-1, Let X be an s-dimensional random vector with probability function p(X), i.e. let 
p(X) dX be the probability that the end-point of the vector lies in the volume element dX 
at the point whose position vector is X. The characteristic function of the distribution of 
X is ¢(T), the mean value of exp ({X.T), where T is a variable but not random vector, and 
X.T denotes the scalar product of X and T,, i.e. 


¢(T) = fexpGx.T) poxax, (1) 


where the integration is over the whole space. This integral is absolutely convergent and this 
justifies its subsequent expression as a repeated integral. 


Now let X = X,+X,+...+X,, 


where X,, X,, ..., X,, are independent random vectors with characteristic functions ¢,(T), 
¢,(T), ....¢,(T). Then it is well known that 


$(T) = $,(T) $2(T) ... d,(T). (2) 


This equation is fundamental in the process of finding the probability function of X. To 
complete the process we use the inversion formula 


p(X) = (2n)-*[ exp(—iX.7) A(T) AT, (3) 


where now the integration is over the whole T space. We use these equations in the ordinary 
way, i.e. find ¢,, dg, ete., from (1), substitute in (2) to obtain ¢ and then in (3) to obtain p(X). 
However, we compute (1) and (3) from analytical forms different from the usual Cartesian 
ones. 

2-2. When X has a spherical distribution, p(X) is a function of |X| = r only; write it as 
p(r). The volume element isdX == drdS, where dS is an element of surface of the s-dimensional 
sphere of radius r, i.e. dX = r“1drd&,, 


where &, is the surface of the unit sphere in s-dimensions. Also 
X.T = rpcos8, 


where p = | T| and @ is the angle between X and T. Hence 


. 


~(T) =| rtp(r) dr | exp (irp cos 0) dX,. (4) 
0 


The inner integral is over &,, and clearly its value is independent of the direction of T and 
is a function of rp only. Thus ¢(T) is a function of p only, say ¢(T) = ®(p). 

To evaluate the inner integral we take polar co-ordinates (r,0,,0,,...,9,1), where 
6,, Og, ..., 8,_2 run from 0 to 7, and @,_, from 0 to 27. Since the integral does not depend on 
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the direction T we can take T along 6, = 0 and then @ = @,. In the important cases s = 2, 3, 
we obviously obtain (respectively) 


2 " 
i exp (irpcos0,)d0,, 27 { exp (irp cos 0,) sin 0, d0,, 
0 0 


i.e. 2nJy(rp), (4asinrp)/(rp), 
which substituted in (4) give the first equations of each of (31) and (32) below. 
In the general case we use the transformation 
x, = rcos6,, 


&, = rsin 9d, cos 9,, 
x, = rsin 0, sin, ... sin @,_, cos6,, 


x,_, = rsin @, sin 8, ... sin6,_.cos9,_,, 
x, = rsin @,sin@,...sin#,_,sin6,_,, 
and easily find, using x, = 27*/T (4s), 


that the inner integral in (4) has the value 
(27) (rp) J,,_3(rp). 
We then have P(p) = (2n)¥*p-¥4 | rd, rp) pln dr. (5) 
0 


Since ®(p) is a function of p only, the same process applied to (3) will give the inversion 
formula 


pr) = (2m) r-iest [™ pied, sirp) (p) dp, (6) 


A different method of transformation will be found in Sneddon (1951) or in Bochner & 
Chandrasekharan (1949). 

2:3. Instead of p(r) we shall often use the function P(r), where P(r) dr is the probability 
that | X | lies between r and r+dr; P(r) is thus the probability function of the distribution 
of | X |, which is one-dimensional, while p(r) refers to the s-dimensional distribution of X. 


any P(r) = r?-p(r) E, = 2a¥*re-1p(r)/P(4s). (7) 
If we define A,(z) with Jahnke & Emde (1945) and others, by 


] ] 
| = 1) (4z)-* se ) 98. 4. —_—— —_ io. 
Ne(2) = Da +1) (42)* a2) = 1-555)" + 3a00a4 2d) Gate 
then (5) takes the very simple form 


®(p) = | P(r) Ayg_-1(rp) dr. (9) 
The inversion formula for P(r) is 
P(r) = 2-H (gs)}{ (rp)*J,_1(7p) B(p) dp. (10) 
0 


Another alternative is to use the distribution function (or ‘probability integral’) 


F(r) = [Po au. (11) 
0 


8) 


9) 
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If we put P(r) = F’(r) in (9) and integrate by parts, using F(0) = 0 and the fact that 
Aj,-1(z) is of order z-#*+# at infinity, we obtain 


(p) = (ps) | © Ayer) P(e) dr | 
7 (12) 
= 2-81 (go) pte” rH, (op) Flr) dr. 
The inverse formula is 
F(r) = 2-(T(4s)}4 rhe | “pl Jalrp) (0) dp. (13) 


2-4. Since A,(x) is a continuous function with unity for upper bound, attained when 
x = 0, it follows from (9) that ®(p) is a continuous function of p and that 


@(0) = 6 P(r) dr = 1, (14) 


| D(p)|<1 (p>0). (15) 


Also if (as is commonly the case, though the Cauchy distribution of §8 is an exception) we 
can integrate the right-hand side of (9) after expanding the Bessel function in a power series, 
(p) will be an even function of p since A,,(x) is even. 
The analogue of the Riemann-Lebesgue lemma (Watson, 1944, p. 457) enables us to 
assert more than (15), namely, 
®(p) = o(p**), 


as p tends to infinity, provided [. ri—tsP(r) dr exists; this integral converges at the upper 


limit bat might fail to do so at the lower. For distributions with a P(r) which is differentiable, 
the order of ®(p) at infinity can be proved smaller by applying the Riemann-Lebesgue 
lemma after (repeated) integration by parts. Conversely, the order of ®(p) at infinity 
provides us with information about the differentiability of P(r). 


3. PRoo¥r BY THE METHOD OF KLUYVER AND MARKOFF 

3-1. We now show how the form of the characteristic function, its fundamental property 
(2) and the inversion formula could have been obtained by the method that Kluyver (1906) 
used to solve Pearson’s problem; Kluyver’s proof is reproduced with some generalization 
in Watson (1944). The corresponding proof for Cartesians was given by Chandrasekhar 
(1943), who then derived some special cases by a change of variables. Following his usage 
the method has often been called Markoff’s method, but it was also used by Kluyver and 
may well be much older; there are indications that, following Kluyver, Heaviside used it in 
the missing fourth volume of his Electromagnetic Theory. It is inferior to the method of §2, 
at least in so far as it gives the final result in a form which obscures the importance of the 
characteristic function. 

It is sufficient to deal with X, the sum of two independent vectors X,, X, with probability 
functions p,(X), p(X). It will be shown that the distribution function F(r) is given by (13) 
with ® = ®, ®,, where ® is defined as in (5) and ®,, ®, differ from ® in having p,(r), p,(r) 
respectively instead of p(r). This is equivalent to the results of § 2. 
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The general idea of the method is as follows. To find the probability that X,, X, satisfy 
a certain condition C we have to evaluate 


[rs%) 7.0%) dX, dX,, (16) 


taken over all values of X,, X, satisfying C. Let d(X,,X,,C) be a function equal to unity 
when C is satisfied and zero otherwise. Then the required probability is 


[r&) ra) 6(X,, X_, C) dX, dX, 


where now the integration is over all values of X,, X,. If 6(X,, X,, C) can be expressed in the 
form of an integral, e.g. Markoff uses Dirichlet’s discontinuous factor, (16) may be evaluable 
by changes of order of integration. 

3-2. We require the probability that | X,+X,| <r, ie. that 


R = (r?+73—2r,r, cos 0)? <r, 
where R,7r,,7, are the magnitudes of X, + X,, X,, X, respectively and @ is the angle between 


X, and X,. Now - 
rw Jy,(1P) Jiea(Rp) dp=1, if R<r, 
0 


=0, if R>r. 

This integral will be used as 6(X,, X,, C). 
We first evaluate 
= | py(X,) 8%, X-,0) dX, 


for a fixed value of X,. We use @ as a co-latitude in defining the position of X,, and put 


Po(X_) dX, = po(r2) 73 dr,dd, 
= po(19) 7§-1 dr, sin®-* 0d0dd,_,. 
Jy,-1( Rp) 


I= ri | palra) Dl) Rea sin’-* Odpdr,d6dX,_,, 


We thus obtain 


where the integration is over the surface of the unit sphere 2,_,, from 0 to 7 for 6 and from 
0 to co for p and ry. 

Since the polar co-ordinates (other than r, and @) needed to specify X, enter only into 
d=,_,, the integration over the sphere merely gives us a factor X,_,. For the integration with 
respect to 0 we use (Watson, 1944, p. 367 (16)) 


*J,,_,(Rp) . Jaga P11) Tyga Pr 
| “ie? ) sin?-2 9d0 = 24°17 T'(4s— }) p!-¥* =e “teat 2) 


Substituting in J and carrying out the integration with respect to r,, 
is re “en (Prs) Jy(7P) Po(p) dp. 
o rf 


Now substitute for J in (16) with p,(X,)dX, put equal to p,(r,) rj-*d,: 


Wi. 
F(r) =r | py(r,) rt APD) 7 (rp) 4p) dpdr, a, 


aie 


he 
le 


en 


2m 


ito 
ith 
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where the integration is over &, and from 0 to oo for p and r,. The integration over 2, gives 
a factor 2, and that with respect to r, gives 


[, Pi(71) r}8Jy5-1( Ps) dr,, 
i.e. (27r)-# p81, ( p). 


Thus F(r) = B-HEL(4s)}A 7 | * pleAJalrp) ©y(p) O4l) dp, 
0 
and this is (13) with O = ©, @,,. 


4. MOMENTS AND CUMULANTS 
4-1. The vth moment of the distribution of X is defined by the equation 


by = [Pe r’ dr. (17) 
0 


Ifin (9) we expand the Bessel function according to (8) and integrate term by term we obtain 
2 
®(p) = 1 5-Map* + 5a) (18) 


so that ®(p) can be used as a moment-generating function. If only moments up to and 
including “4, exist then (18) will finish with a term o(/»,p7"). ®(p) generates only even 
moments, but this is not surprising since we are dealing with symmetrical distributions. 
P(r) has been defined to be unsymmetrical about r = 0, but it could be made symmetrical 
if we restricted the angular co-ordinate 0,_, to run from 0 to 7 and allowed negative 
values of r. 

4:2. Multiplying series such as (18) clearly shows that the second moment of the sum 
(X, +X, +...) is the sum of the second moments of X,, X,, ..., and that this additive property 
does not extend to the higher moments, The coefficients in the expansion of log ®(p) will. 
however, be additive. Hence we define cumulants Ag, Ay, Az, ... by the equation 


log D(p) = a. 55 t2P” +. (19) 


1 
A,p*- > Ae pf t :es 
2.4s(¢4+2) *° — 2.4. 6s(s+2)(¢+4) °? 
log ®( p) being the cumulant-generating function. When s = 1, these reduce to the cumulants 
of a symmetrical one-dimensional distribution as usually defined (see, for example, Kendall, 
1943). If in other cases we need to make a distinction we can use the term polar cumulants. 
4.3. Formulae connecting moments and cumulants of lowest order are 


Me = Az; (20) 


s+2 , 
fy = Agt+- 5 Me 





s+4 8+ 2) (: 
Ilg = Ag+ B= AgAgt eriet #) 7g 
s 2 4) (s+ 
fs = ee AgAs+3 (s+ 4) (8 + 6) AB+6 ——— AyAz+ (s+ i ) (8 + 8) As, 
8 7 8(8 + 2) 8? 8 
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8+2 
Ag = Ma ; Hi, 
s+4 > (8+ 2) (s+ 4) 
Ag = ie: Male + 2— oe H3. r 
8+6 , (¢ + 4) (8 + 6) (8+ 4) (s+ 6) , (8+ 2) (s + 4) (s + 6) 
Ag = Me— ths on + 12 a 6 3 #3.) 


(22) 

These and further formulae can be obtained by equating coefficients of powers of p in the 

equation obtained by substituting from (18) into (19). It is, however, easier to utilize the 
known results for the standard non-symmetrical case in one dimension. Write 


, Hoy Ag, 
# — ? Ky saad > 
” — 8(8 + 2)... (8 + 2v— 2) 8(8 +2) + (s+ 2v—2) 








(23) 
and also t = — },:". With this notation we are to equate coefficients in 
{ , 1 , 1 
log (1 + pyt+ 5 Mg? +...) = Kyt+s Kel t+.... 


But this is precisely the equation that we use in the ordinary one-dimensional case (when 
H; is the vth moment about an arbitrary origin and x, is the vth cumulant; see, for example, 
Kendall (1943), equation (3-22)). Consequently the formulae worked out for this standard 
case will give the formulae we seek if we replace ;, x, by the expressions in (23). From 
equations (3-29) and (3-33) of Kendall (1943) may be obtained expressions for moments 
and cumulants up to and including fo, Ago. It should be emphasized that the notation in 
(23) is purely temporary and intended only for the purposes of this paragraph. 


5. PROJECTED DISTRIBUTIONS 
5-1. When a spherical distribution D, in s dimensions is projected on to a subspace of 
q dimensions passing through the origin the result is obviously another spherical distribution 
D,; for example, a three-dimensional distribution might be projected on to a diametral 
plane (s = 3,g = 2), or on to a diameter (s = 3,qg= 1). Using Cartesian co-ordinates 
%1,%g,...,%q,--.,%, With 2,,...,%, as the co-ordinates in the space of the projection, the 
probability function for D, is 


PylXy, Xp, ---,Xq) = ( pala, es +09 Ug) OX y44 --- UL, (24) 
7 


with characteristic function 
Palty, te, ---,tg) = | pate, soe) %q) Xp [0(t, 7 +ty%o+...+t,%,)]dx,dx,...dx, 


= [rte Ug, ..+.,X,) exp [0(t, 2%, +ty%e+... +t, xX,)] dx dx... dx,, 


i.e. alts, tay --+5 tg) = Paltys te, --+5 tgs 0, ---, 0), (25) 
where ¢, is the characteristic function of D,. This holds for any distribution, spherical or not. 
However, if D, is spherical, then both sides of (25) are functions of p, = (12+#+...+02)t 
only, say ®,(p,) and ®,(p,). Hence 


®,(p3) = ®,(/,), (26) 
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i.e. if ®(p) is the characteristic function of the spherical distribution D,, then ®(p,) is the 
characteristic function of its projection D,; in other words, a spherical distribution and its 
projections all have the same characteristic function. 

5-2. From this there immediately follow relations between 72, 44, ..., the moments of D,, 
and mz, m4, ..., the moments of the projection D,. Expanding the two sides of (26) according 
to (18) and equating coefficients of powers of p,, 





Ha. ™ éMa ™, Ke m™¢ (27) 


sq’ (s+) g(q+2)’ a(e+2)(6+4) gg+2)(q+4)"” 
with exactly similar relations between cumulants. For example, the distribution of a 
globular star-cluster is often spherical and has to be inferred from its projection on the 
‘plane of the sky’. The moments of the space distribution are obtainable from those of the 
plane distribution by applying (27) with s = 3, g = 2, and are 3mz, 42m,, 23mg, etc. 

5-3. Equation (26) is one way of expressing the relation between D, and its projection D,. 
There are others in which the probability functions p,(r) and p,(r) are expressed in terms of 
one another. Change to polar co-ordinates in (24), letting r be radius vector in the space in 
which a point has co-ordinates (2,,%3,...,%,), and u be radius vector in the space in which 
a point has co-ordinates (x,,1,% 9, ---,%,)- Then 

pelt) = | pide? +u8)) W410, qd, 
the integration being over the whole of the second space, i.e. 
Pelt) = Zag) Palv(rt+ut)) ured, 


or putting s = q+ 2m and # = r?+ u?, 
pat) = 2a™{C(m)}-* |” Pe anlt) (= 18) at (28) 


\m 
To obtain p,,9,(7) in terms of p,(r), apply the operator (- : 5) to (6) with s = q, using 
/ 


the recurrence relation 


z dz ze gntm * 





ld y=) _ In+m(2) 


The result is Po+em(") = (- wd a) Balt ) (29) 


if m is an integer. If 2m = 1, application of (28) and (29) in succession gives 


Pasilt) =2 3 “V(@—1) © AT pee V(—?)’ 





in the proof of which we have regarded D,,,, D, as projections of the same D,,.. Alternatively 
regarding D,,,,.D, as having a common projection D,_, 


rey Pet. 5 | °  palt) 
Posil) = ~~ Onr dr _—somrdr\Jo Ta—) ‘ (ame) 


a form in which the integral is more likely to converge at the lower limit; divergence can 
occur only if ¢ = r is an infinity of p(t) in (30), or of p,(t) in (302). 





4-2 
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5-4. Equations (29), (30) and (30a) can be unified by the concept of integrals of any 
order. The appropriate definition of J, f, the integral of f(x) of positive order a, is 


Lf = Way {” t-aytsoat, 


It is well known (and easily verified) that 


Lf = [70 dt, I,(1;f) = Ligh, 


if the integrals are absolutely convergent, so that when a is an integer, I, f is the «-fold 
integral of f(x) as ordinarily defined. The operation inverse to J is differentiation with 
respect to —z and we define a derivative of order a by 
7 .. .@ 
d(—x)* d(-—zx 
d2 


= aa) I, ,f (l<a<2), 





y Ai-af (0<a<l1), 


and so on. 
If now we put 7? = R, ? = T, p,(r) = f,(R) in (28), we obtain 
S,(2) _ WD fa+2m(2). (28a) 
a™ 
It then follows that So+em() = 1-™ aR SF), (29a) 


which reduces to (29) when m is an integer, and to (30a) if m = 3. 


6. THE CASES s=1, 2,3 
6-1. If s = 1, the vectors are confined to a straight line with a distribution symmetrical 
about the origin. If p(x) is the probability function, then P(r), the probability function for 
r = |x|, is equal to 2p(x) and the characteristic function is 
o 2) 
A(t) -{ pa) et dx -| P(r) cos rt dr, 
—© 0 
which agrees with (9), since A_,(z) = cosz. 
If s = 2, the principal formulae are 


O(p) = 2n | rJ,(rp) p(r) dr = [ J,(rp) P(r) dr, 


79 J 0 


P(r) = 2mrp(r) = | ‘ rpJ,(rp) D( p) dp, > (31) 
0 





P(r) = + |“ dire) ®(p)dp. 


If s = 3, the Bessel function involved is an elementary function, since A,(z) = sinz/z. The 
principal formulae are 


r 


®(p) = (47/) | z rp(r)sinrpdr = | (sin rp/rp) P(r) dr, 


P(r) = 4nr*p(r) = (2/n) rp®(p) sin rpdp, , (32) 








~~ le ss FF 


ul 


2) 
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7. THE NORMAL DISTRIBUTION 
7-1. We take for our first example of a special distribution a case of the general normal 
distribution and briefly reconsider some of its properties in the light of our general theory. 
A vector X = (x, 22, ...,%,) has a normal distribution if 
p(X) = Cexp { ay Q(x, Xo; sees x,)}; 
where C is a constant and Q is a positive definite quadratic form in 2,, 22, ...,%,, Which in the 
case of a spherical distribution must be a sum of squares with equal coefficients. Thus 
p(r) = Cexp (— }r?/A), 
where A is a constant. Using Watson (1944, p. 394 (4)), we find for the characteristic function 
(p) = O(27A)# exp (— 4p). 
Then (0) = 1 gives C = (27A)-#, and expanding in powers of p we find A = ,/s, where 
Mz is the second moment of the distribution. Hence 


P(r) = (8/(27q))*# exp (— $sr?/u9), (33) 

2 e \t 
P(r) = TGs) (=~) rs t exp ( _ }sr2/W»), (34) 
®(p) = exp (—4/,p"/8). (35) 


Obviously all cumulants except A, = “, are zero, and the higher moments are 
Hy = (8+ 2) (8+4)... (8+ 2v— 2) s!~p}. 

The distribution function is found by integrating (34) and when s is even is expressible 
in terms of exponential functions and when s is odd in terms of the ‘normal probability 
integral’. Differences of this sort between odd and even s will be often found with other 
distributions. 

7-2. By multiplication of characteristic functions it follows exactly as in the one-dimen- 
sional case that the sum of independent normally distributed random vectors is itself so 
distributed with second moment equal to the sum of second moments of the component 
vectors. Also since all spherical normal distributions have the same characteristic function 
(35), the difference being only in the scale factor j,/s, the projection of a spherical normal 
distribution is also normal. 

The normal distribution has the same importance here as in the standard theory; it is 
the usual limiting form of the distribution of the sum of n independent vectors as n tends to 
infinity (central limit theorem). Taking the case when each vector has the same distribution 
with cumulants A, = /2, Ay, ..., the sum will have cumulants ng, nAq, .... Changing the unit 
so that the second moment becomes unity, the cumulants of the sum are 1, A,/(n3), 
A,/(n?u8), ..., tending to limiting values 1, 0, 0, ..., as m tends to infinity. These limiting values 
are the cumulants of a normal distribution. Thus the limiting probability function for large 
n will be given by| Pir = ays fore )" Soh (aes). 

T'($s) \2npy 2nflg 
This equation gives an approximation which is often accurate enough for most purposes if 
nis as small as 10. It may be noted here that for distributions of the same class the approxi- 
mation for a given n commonly improves as s increases. 

The gaps in this deduction can probably be bridged by a more rigorous discussion using 
Hankel-Stieltjes transforms. We can certainly appeal to the known general theory, which 
shows that a sufficient condition for the validity of (36) is the existence of 2». 





(36) 
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8. THE CAUCHY DISTRIBUTION 

8-1. The Cauchy distribution in s dimensions with parameter a has probability function 
p(r) = C(a*+r?)-**-+ (C = constant). (37) 

It answers the following question in geometrical probability: O is the foot of the perpen- 
dicular, of length a, from a given point A on to a given hyperplane (or line, if s = 1); lines 
are drawn at random through A to meet the hyperplane, all directions being equally likely; 
what is the probability that the line meets an element at distance r from O? If s = 1, the 
distribution is well enough known to figure in many text-books, but for s > 1 it may be new. 
The present interest of the Cauchy distribution is that it provides an example for which 
the distribution of the sum of vectors does not tend to the normal distribution. For using 
Watson (1944, p. 434(2)) to evaluate the characteristic function and the constant C, we 
find that for all values of s @(p) = e-F4, (38) 


Hence the sum of n vectors with Cauchy distributions with parameters @,,@,...,a, is 
a Cauchy distribution with parameter (a, +a,+...+a,) and does not tend to normality as 
n tends to infinity. The result of §7-3 is inapplicable since no moments exist. 

Also (38) shows immediately that the projection of a Cauchy distribution with parameter 
a is another Cauchy distribution with that parameter. 


9. THE EXPONENTIAL DISTRIBUTION 
9-1. The distribution for which p(r) is proportional to exp(—r/b) will be called an 
exponential distribution with parameter b. Using Watson (1944, p. 386 (6)) to evaluate the 
characteristic function, and (14) to evaluate the constant of proportionality, we find 


P(r) = mt#9(2b)~* e-”/T'( $8 + 4), (39) 
O(p) = (1+ 6%?) -**, (40) 
Mg, = 8(8+1)... (8+ 2v—1) b”, (41) 
Ag, = 2’-1(v— 1)! (8 +1) 8(8 + 2) ... (8 + 2v— 2) b”. (42) 
9-2. For the sum of n vectors, each with the same exponential distribution, the charac- 
teristic function is ®,,(p) = (1 +6%p2)-#n—-ins (43) 
and inverting, using Watson (1944, p. 434(2)), we obtain the probability function 
a(t) = Ab-+(5) K,(5), (44) 
where 2a =ns+n—8s, A = 2!-8-*7_-48/T'(4ns + 3n), 


and K,(x) denotes the second kind of Bessel function of imaginary argument. Whenever 
« is half an odd integer, p,(r) reduces to p(r) multiplied by a polynomial in r/b of degree 
a—4 (Watson 1944, p. 80(12)). 

Comparing (40) and (43) we see that ®,,() is the characteristic function of an exponential 
distribution in (ns + n — 1) dimensions. Hence the sum of n vectors with identical exponential 
distributions with parameter 6 is the projection of a vector in (ns + — 1) dimensions also 
with an exponential distribution with parameter 5. 

When the vectors to be added have distributions with different parameters 5,,5,,..., the 
characteristic function of their sum will be 


©, (p) = (1+ b%p%)-# (1463 p%)-# ... (140% p2)-%, 








er 
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where # = $s +4. When s is even the integral for p,,(r) seems intractable even for the simple 
case 8 = 2 = n. When s is odd, £ is an integer and ®,,(p) can be split into partial fractions 
each of which will contribute to p,(r) a term like that on the right-hand side of (44). 

9-3. Distributions of type (44) with « real and positive provide a generalization of the 
exponential integral, which would appear to be the ‘right’ generalization to s dimensions 
of the gamma or Type ITI distribution of statistics. We may rewrite the probability function 


as p(r) = Ab-*(r/b)* K,(r/b), (45) 
with A = 2!-*-«7,-48/I\(~ + 48), and the characteristic function as 
(p) = (1+ b%2)-=4s, (46) 


The parameters a, 6 determine the distribution, b being a scale parameter; a = } gives 
the exponential distribution. 

By multiplication of characteristic functions it is clear that the sum of two vectors whose 
distributions have parameters «,,5 and a,b has a distribution of the same type with 
parameters «, + %,+ 48,6. It is also clear from (46) that distributions with the same 6 and 
the same (a + $s) are related by projection. 

The distribution function for (45) is 


F(r)=B { » wet tK,(u) du, (47) 


where B = 2?-*-/{T'(4s) (a +4s)}. If s = 2, this is 


F(r) = 1— B(r/b)*** K, (7/6), (48) 
where B = 2-+/I'(a +1), since 


d 
P's {u"K,,(u)} = —u"K,,_,(u). 


If s is any other even number, F(r) can be similarly obtained by integration by parts. If 
8 is odd, the integration can be made to depend on the evaluation of 


[wx a) du, 
0 


which has been tabulated by Pearson and others for # = 0($)11 (see Fletcher, Miller 
& Rosenhead (1946) for precise references); Pearson has also tabulated u’K,(u) for the 
same range of values of f. If « is half an odd integer we can evaluate 1 — F(r) for all values 
of s as e~*/® multiplied by a polynomial of degree a +s — $. 
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A NOTE ON THE CONSISTENCY AND MAXIMA OF 
THE ROOTS OF LIKELIHOOD EQUATIONS 


By K. C. CHANDA, Bombay University 


1. INTRODUCTION 

An exhaustive study of the properties of the maximum-likelihood estimates was made 
earlier by Huzurbazar (1948). He did, however, confine his discussions to distributions with 
only one unknown parameter. Although the extension of his results to multi-parametric 
distributions is conceptually easy, the algebraic details of such derivations are sometimes 
complex and at times quite interesting to follow. The consistency of the maximum-likelihood 
estimates for the general case was, however, deduced later by Wald (1949), but on the 
assumption that the estimate really maximizes the likelihood function; it is thus different 
from the maximum-likelihood estimate as defined by Fisher. No such assumptions are 
made here; but the basic assumptions underlying the proofs are completely different from, 
and in a sense stronger than, those of Wald. 


2. ASSUMPTIONS 


Suppose f(z,9) be the probability density law; 6 = (6,,...,0,) is the unknown parameter 
vector and let 2,,...,2, be m independent observations on x. The likelihood equations for 
estimating 6 are then given by 





dlogd _ re 
o, ~9 "= 1, 2, ...,&) 
where log ¢ = > log f(z,,6). 
i=1 


For brevity’s sake we shall henceforth write f for f(z, 0) and f; for f(x;, 4) 
Let us now assume that: 
(i) The point represented by the vector @ lies in a k-dimensional interval Q; for almost 
all x and for all de Q @logf dlogf (eeg Plogf 
06, * 06,00, 00,0000, 
exist for all r,s,¢ = 1, 2,...,k. 
(ii) For almost all x and for every point 0e Q 





ce |< Fale and | oes < H,4(2), 


| 
36,|< 7) |agag, | 30,00, 00, 


while | ” Ha) fde<M 
for all Oe Q and for all r,s,t = 1,2,...,4, M being a finite positive constant. 
(iii) For all 0¢ Q the matrix J = ((J,,())), where 


© él _ 


r 








pS dz 


is positive-definite and that | J | is finite. 


ter 
for 


ost 
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Assumption (iii) forms the pivot of later proofs; but it is not, as it stands, very much 
restrictive. The reason is that, whenever we are studying the properties of different estimates 
(which include maximum-likelihood estimates as well), we have always at the background 
the concept of efficiency which in turn is measured by a simple function of the information- 
determinant | J|. For the class of unbiased estimates, the generalized variance is bounded 
below by a quantity which assumes the trivial value, zero, when | J | = 00. This assumption 
is thus felt to be of quite general importance. As J is a dispersion matrix it is always positive- 
definite, except perhaps in degenerate cases. This may be proved very simply as follows: 

Let (A,, Aq, ...,A;,) be any real row vector. Clearly 








Clog f Clog f | 
Bla, 20, +A, 20, +...+A, 20, } 0, 
i.e. XXA,A,dJ,, > 0, 


which proves [J] to be positive-definite. 


3. DERIVATION OF RESULTS 


Let 0° be the unknown true value of the parameter vector 0, where we suppose that 
0° is an inner point of Q. Consider the following expansion: 


Ologf (dlogf _ (Se) 
a (“a Net 2-9 (55,56, ) pu 








=_ 
ls 0 0 ( Plog f ; 
+5, 2 (0-9) (O,— 6?) ane), (3-1) 


where 6’ = 0’(x) is a value depending on z, but for all x, lying inside the hyper-cell, of which 
the vector 0—6° is the diagonal. Multiplying both sides by 1/n and summing the corre- 
sponding expressions for z,’s over i = 1, 2, ...,% we may rewrite (3-1) as 





























k k 
L,(0) = L,(6°)— ¥8,L,(0°)+5 E BdLry (F = 1,2, 0k), (3:2) 
s=] s,t=1 
where 6, = (6,—98), 
_12 (dlogf;\ _ 1 dlog¢ 
LG} = = ( 06, 0 00, 
n (92 i 2 (3-3 
L A ns 2 BE) =~ x7 loge ) 
m" n 4=1 \ 06,00, n 00,00, 
L os 2 rant 
val Nit 00,0006, 0 = 0" (x) / 
We note that i) E (e) ng eth. 
Tee F ra 
(i) B(- yA) =I) (8 =1,2,..58), (3-4) 
ie Slogf | . - 
(iil) E (|s5-30e95,|) <¥! (r,8,t= 1, 2, ..-,h), | 


for all 6€Q. 
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From (iii) it follows immediately that 


3 
z) log f 


80,00, 00) <M forallz 


6=6(x) 





(since 6’(x) € Q for all x). Also 








1 2 | @log fi L.2 
| Li, | < n Pt 06, 00, 00, eared n PR i, 54(2;) 
and Ef, (x;)} < M 
for alli = 1, 2,...,n; so by Khintchine’s theorem, 


(i) Z,(9°)>0, 
(ii) L,.(0°) > J,.(9°), 


(3-5) 


(iii) | Ly» |< - > A, (%;) > BiH, 4(%)}<M, i.e. | L,4|>B<M, 
i=1 


where B is a finite-positive constant, with probability tending to unity. That is to say, we 
can choose an N) = n(7,€), such that for all n > n9(7, €) 


Pr [| L,(9) | <1; | Lg) — Jpg(9) | <3 | Lpa | <W] > 1c, (3-6) 


N being a finite-positive quantity greater than M and 7 and ¢ being two arbitrarily chosen 
small positive quantities. The likelihood equations are therefore given by putting L,(@) = 0 
(r = 1,2,...,&) in (3-2), ie. 

&°"Ts 


k k 
L,(0°)— ¥ 8,L,(09) += ¥ 8,4,L4=0 (r=1,2,...,k), (3-7) 
=] t= 


1 
k 1 & 

i.e. 2 4,L,,(9°) = LO) +5 D 6,4,L,4 (r= 1,2,...,k). 
s=1 “3s,t= 


Again, since E(L,,(0°)) = J,,(0°) and since ((J,,(0°)))-? = Jo 1 (say) exists, ((Z,,(0°)))-? = L- 
(say) exists, too, and so we have from (3-7) 


k 
6, we B+ >> 8,9; ps4s (3°8) 
8,t=1 
k 
where B, = > L, (8°) Lr(6), 
p=1 


k 
Ay st ie >> Lye L”"(0°), 
p=1 


((L?"())) = L-*. 


Since L-' is a matrix with finite elements, we can choose our n,(7,€) for arbitrary 7,¢€ such 
that for all n > n9(7, €) 


k 
Pr ! Br | < % | Ly(0°)| | L2"(6%) | <3 


1k 
| Org | <5 E | La | | Zr(6°)|<7 for all r = 1,2, ak 
2 


>l-e for o>T7>N. (3-9) 


we 


en 


ch 
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Consider now the set of equations 
k 
6, = B+ >; 6,8; rst (r= 1, 3,...g8) 
s,t=1 


It is easily seen that this set of equations admits a solution $= 6,, nou 3.) which are of the 
same order as 7. For if we take 7 to be arbitrarily small, then in view of (3-9) it follows that 
the contribution due to the second term on the right-hand side can be made to be equal to 
a quantity of smaller order than 7; and so 


Pr[|3|<y]>1-e 


for a sufficiently large value of n and for all values larger than that. It follows immediately, 
therefore, that there exists at least one solution of the likelihood equations, which is a con- 
sistent estimate of the true parameter vector 0°. It is not, however, at once evident that this 
equation does not admit any other solution which may be inconsistent. We propose, however, 
to show that the consistent solution is unique. But before that we shall prove the following 





2 A 
THEOREM |. The matrix 0 = (5 ae | ) , where @ is a consistent root of the likelihood 
r““s/ O=6 


equation, is negative-definite with probability tending to unity. 
Proof. Consider, in fact, the relation 












































log ¢d _ (#log¢ “ Slog ¢d 
or (Sa.50.) ont ORM (56-2056, et) oe’ = 
where as before 6” = 6” (x,,...,2%,) is, for all (x,,...,xz,) an inner point of the hyper-cell of 
which the vector 6 — @ is the diagonal. 
We have 
1 /@ log #) 1 (log ¢d es 1 loge 
= (35,00, os 7 ( 2.20, on < 214-8, 50; 20,98) i: 
Since 6 is a consistent estimate of 0°, and since 6” (x, ...,2,) is an interior point of Q, 
Pr || 6,— 6 | </2kN (t = 1,2,...,k); 
(3-12) 
n| 80,00, 06,|» _<a[oa-o for all n > n,(7,€). 
Hence for n > ,(7,€), we have 
|1 (log ¢ 1 (0 log £) : 
Pr| | Caer 20, ), apa) <4y|>1-e. (3-13) 
Also E (= ae), = —J,,(9°), and so, by Khintchine’s theorem, it follows that 
1 (log ¢ , 
a ed <4y|>1-e (3-14) 





for n > n,(7, €). 
Again we have 


1 (Plog ¢ 
n ( 06, 00, a 140°) < 











1 (“ess 1 aed 2 se) 
06, Sf) «a eae O= 9 i n 00,00, “ot r 
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Hence for n>, = max (n,, 7”), we have 





1 (log ¢ | 
Pr( | (Goes) tel) <39+49 = q >l-e. (3-15) 


Consider the positive-definite quadratic form 


k 





Qo = U,U,J,.(9°). 
?,s=1 
k 

Let further Y= > u,«,9,,; 

r,s=1 

_ 1 /#log¢d 
where 0,,. = z | a 
We have for n >n,, 
Pr[—Q)—9(Zu,)?<Q< —Q,+7(Zu,)?] > 1—e. (3-16) 
k k \2 
Since = Dd 4u,u,d,(6) and Q= (= u,) 
r,s=1 r=] 


are respectively of rank k and 1, it follows that we can by a non-singular linear transforma- 
k 
tion u,— U, convert Q, and Qj to ¥ U? and ~U? respectively, where > 0 is the only non- 
r=1 


zero root of the characteristic equation | A—J,| = 0, A being the matrix with all unit 
elements. Evidently 0 < <0, and so 





k 
y V2 
Q% ro a. 7 
ee. SE et >—>€ (say). (3°17) 
k 
(Su), OF eid 
\r=1 / 


By taking 7 = £ and by choosing an n,(7,¢€) we can ensure that 
Pr[(Q<0]>1-e for all n>n,(7,€) 


and so Pr [© is negative-definite]>1l—e for alln>n,(7,€). (3-18) 


a 


It follows immediately that the likelihood function has a relative maximum at 6 = 6, on0g a) 
with probability tending to unity. Now we prove the following 


THEOREM 2. Of all possible solutions to equations (3-8), one and only one tends in 
probability to the true parameter vector 0°. 

Proof. Suppose, if possible, &’ be another consistent estimate and at the same time 
a solution of (3-8). By hypothesis, 


dlog 2) i (Se £) df ~ : ; 
(=Be Whe, ay eas for r = i he SY (3 19) 





Now let us consider only one such pair of equations, say the rth. It follows, therefore, in 
virtue of Rolle’s theorem extended to multivariate functions that there exists a point 67 
(depending on r), lying inside the hyper-cell of which the vector 6 — 0’ is the diagonal, such 
that 





log ¢d ~ 7“ ’ 
(eSF),.7° for s = 1, 2,...,k. (3-20) 


in 


ne 
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2 
It is therefore immediately evident that the determinant-of the matrix ((5-8F ran) ) 
0=6; 


vanishes. This, however, is a contradiction, since 67 being a consistent estimate of 7° (since 
it lies inside the hyper-cell with diagonal 6 — 6’ and since 6, 6’ are both consistent estimates 





(2 
of 0°), the Hermitean ((5 jee | ) must be negative-definite with probability tending 
Pp 0=67 


to unity, as we have seen earlier; hence the theorem is proved. 
Consider now the likelihood equations 


xe L,,(@°) = LO) +5 1S 6,6,0,, (f= 1,2,...,B). (3-21) 
r.t=1 


We wat a that: 


Ni=1 
identical distribution, is itself distributed asymptotically normally with zero mean and 
variance = J,,(0°)/n. Again in virtue of the generalization of Liapounoff’s central limit 
theorem, it follows that L,(6°), ..., L,,(0°) are asymptotically jointly distributed as a k-variate 
normal distribution, with zero means and variance-covariance matrix J)/n. 
(2) L,,(0°) > J,,(0°) with probability tending to unity. 
B ) The second expression on the right-hand side of (3-21) is of a smaller order than 


5 6,L,.(9°). 


8 


1) L(@) = > (ee logs 4), r being the average of n independent variables, each having 


Tt follows, therefore, in virtue of Khintchine’s theorem, that 6,’s have asymptotically 
a joint k-variate normal distribution with zero means and the variance-covariance matrix 


given by V = ((nJ,,(0°)))—. 
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GROUPING METHODS IN THE FiTTING OF POLYNOMIALS TO 
EQUALLY SPACED OBSERVATIONS 


By P. G. GUEST 
University of Sydney, Australia 


INTRODUCTION 


The fitting of polynomials to equally spaced observations by the method of least squares is 
carried out by means of power moments (Guest, 1953) or orthogonal moments (Anderson 
& Houseman, 1942), tables of orthogonal polynomials and related functions being used. 
When the number of observations is large (greater than 104) such tables are no longer avail- 
able, and it becomes necessary to group the observations before performing the calculations. 
Even if the number of observations is less than 104 it may be considered advisable to group 
the observations in order to reduce the time spent on the calculations. It is the purpose of 
Part I of the present paper to show how unbiased estimates of the polynomial coefficients 
and fitted values may be obtained by least-squares methods when the observations are 
grouped, and to determine the reduction in efficiency which occurs. 

A method of grouping which is of some interest is that in which step functions are used to 
evaluate the polynomial coefficients. The practical procedure using single-step functions 
has been outlined in an earlier paper (Guest, 1951). In Part IT of the present paper the method 
of determining the optimum step functions will be discussed for both single-step and double- 
step functions. The fitting of a straight line by the use of step functions, being the most 
important case in practice, will be treated in detail. 


Part I. LEAST-SQUARES METHODS 
(1) The orthogonal coefficients 


The n observations y(e) are first converted into N groups each containing r observations and 
represented by the symbols Y(H). The values of the independent variables e and £ corre- 
sponding to the observations are Pg cave es be spaced at unit intervals in the range 
+4(n—1) to —}(n—1) and the range + 4(N —1) to —4(N—1) respectively. Then 
+24(r—1) 
Y(Z£) = y y(r£+z). (1) 
z=—i(r—1) 
A least-squares polynomial is then fitted to the N grouped values Y(Z), using either 
orthogonal moments or power moments. The fitted polynomial is written 


U,(£) = D> A; T;y(£), (2) 


where 7;,(£) (often written £;,y(H)) is the orthogonal polynomial of degree j in EH with 
leading coefficient unity. The coefficients A; are given by 


=% T,y(Z) Y¥(E B\ZT} (3) 


an 


fifi 


ex 
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The problem is to obtain from these values A; estimates a; of the coefficients in the poly- 
nomial » 
u,(€) = 2 %Tinl€), (4) 
j= 


which is to fit the n original observations y(e). 
It is easy to show (Guest, 1952, p. 239) that if any arbitrary independent set of p+ 1 
functions W,(e) is chosen the system of equations 


x Wye) [y(e) — ~ A, Tien (€)] = 9 (5) 
leads to unbiased estimates of the polynomial coefficients. Since it is desired to express 


the a, in terms of the A, given by equation (3), the functions appropriate to the present 
discussion are 


W,(€) = Tjy(2), rE" <e<rB+"—, 
and the system of equations determining a,, becomes 
- T,y(#) (¥(E£) — = AyD Tyn(rH +z)] = 0. (6) 


To simplify this expression it is necessary to expand > 7;,,,(rZ +z) in terms of the poly- 
nomials 7;,(#). Taking as an example the case k = 3, : 


X T3,(r# +2) = L [(r# +z)3 — {(3n® — 7)/20} (r# +z)] 


= D [PE + 3rE2*— {(3n* — 7)/20} rE] 


= rtHS + {3r2(r? — 1)/12} B — {(3n? — 7)/20} °# 

14T,y(E) — {r?(r? — 1)/10} T,y(£). 

By similar calculations the following expressions are obtained for polynomials up to the 
fifth degree: 


>» To,(7E + z) — rTyy(£); 


LT 1n(7£ +z) = °Tyy(Z); 
S Tn(rE +z) = °T,y(E); 
5 Tsn(vE +z) = r*T3y(B) — {r°(r? — 1)/10} Ty (2); 
5 Ty,(7 HE +2) = 1°Tyy(E) — {37°(r? — 1)/7} Tey (E); 


X= T5_(rE +2) = r°T,y(E) — {10r4(r? — 1)/9} T,y(Z) — {r?(r? — 1) (n? — Gr? + 8)/126} T, (2). 


On substituting these expressions in equation (6), and using equation (3), the following 
expressions are obtained for the polynomial coefficients: 


a, = 7-*A,; 

a, =r *A,; 

a, = r~*[A, +42(1—r-*) As]; 

a, = r*[A,+3(1—7-*) Ay); 

a, = r-*[A, +4h(1 - 1 -*) Ag + zhq (1-17) (N?2 + 8 — 6r-*) Ag); 


= r-l 
@y = 1 1Ay. 
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In general 5 te . 
a; = ane’ Vic Ar (7) 
va" 


where the values y,,; are unity and y,, is zero for odd values of j+k. For polynomials of 
degrees up to the fourth the only non-zero coefficients other than y,; are y,; and 7.4. These 
are listed in Table 1 for values of r from 2 to 15. 


Table 1. Values of y1, and Yo, 



































r Vis Y2a r Vis Yea 
2 3/40 9/28 9 8/81 80/189 
3 4/45 8/21 10 99/1000 297/700 
4 3/32 45/112 11 12/121 360/847 
5 12/125 72/175 12 143/1440 143/336 
6 7/72 5/12 13 84/845 72/169 
7 24/245 144/343 14 39/392 585/1372 
8 63/640 27/64 15 112/1125 32/75 
In tables of orthogonal polynomials the polynomial tabulated is 
Ei(e) = Ay Ti(€). 
Thus if U,(£) = p> B;€y(Z) 
j 
and U,(€) = x b; Ein(€); 
J 
: Pp > 
then b; = r-U+) 2% Vite(Anw/Ajn) B,.. (8) 
ra, 


If the fitting is done by power moments, the coefficients A, and a, are obtained directly. 


(2) Standard errors of the coefficients 
If o(y) denotes the standard error of y and o( Y) the standard error of Y, it is clear that 
o*(Y) = ro*(y) (9) 
and o*(A;) = o°( Y)/ZT Fy. (10) 


From equation (7) it is apparent that o?(a,) will depend on p as well as j7. However, provided 
N is not less than 10 the variation with p turns out to be insignificant. Considering the 
coefficient a, with p = 5, 


o*(a,) = r—*[o7(A,) + zhq(1 —17*)? 0?(A5) + pede (1 —177-*)? (N?2 + 8 — 6r-*)? (A) ]. 
On inserting the known expressions for 27%; (Allan, 1930) in equation (10), it is found that 
o*(a,) =1—407(A,) [1 + 6(1 —1-*)? N-4). 
Similarly it may be shown that 
o*(a) =r-*o*(A,) [1 + 45(1 —r-?)? N-4], 
o*(a3) =1r-*o*(A,) [1 + 308(1 —r-*)? N-4]. 
Thus provided N is not less than 10 or so 


o*(a,;) = r-%+0.g2(A,), (11) 


ese 


that 


(11) 
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Let the values obtained for the coefficients by fitting a least-squares polynomial to the 
n observations be denoted by a;. Then 
0°(a;) = o(y)/ZT} 


jn? 


and so the efficiency of the value a; obtained from the grouped observations is 


N(N*—1)...(N?—}?) o%(y) 


— 72h+2 
q, = ets ——__ oS 


n(n? —1)... (n? —j?) oY) 


(n?—r?) ... (n* —j*r*) 


~ (n®@=1)... (n? =?) ’ 


~y  AG+1)(29+1)) ~ 
os ais |. Suni ‘ited *} 


4 HPV CGM) 10+ 19) +8) _ ay IG FD 29+ VP 
360.4 : 36N4 

From this expression the minimum number N of groups which must be retained for the 
efficiency to exceed a specified value can be determined. Table 2 gives the minimum number 
of groups required to ensure that the efficiencies do not fall below 0-98, 0-95, 0-90, 0-80 and 
0-70. These values are calculated from (12) by neglecting terms r-*. If r is small a slightly 
smaller number of groups would be permissible; as an example, for a, with r = 2, the value 
of N for an efficiency of 0-95 drops from 17 to 15. 


{r-?—r-}. (12) 


Table 2. Minimum values of N, the number of groups, to obtain stated efficiencies 








Degree j 1 2 3 4 5 
Efficiency 7; 
0-98 8 16 27 39 53 
0-95 5 10 17 25 33 
0-90 4 8 12 18 24 
0-80 3 5 9 12 16 
0-70 — + 7 10 13 


























(3) Standard errors of the fitted values 


For the fitted values u,(é) = Da;T7;(e), 
j 


the standard errors may be found by expanding the coefficients a; using equation (7), 
collecting terms in A,, and applying equation (10). Thus, neglecting terms in r~* in the 
expressions for y,,, 
o*[u,(€)] = r-*o?( Ag) + T?(e) r-407(A,) + T3(€) r-%o?( Ay) 

+ [T3(€) + por? (€)]? ro7(As) 

+ [Ty(€) + 37° Ty(€)P r-Mo*( Ay) 

+[T(€) +4pr?T;(€) + peer?n?T(€)]? r-*07( As). 
This is clearly a complicated function of e. However, the leading terms T;(€) inside the square 


brackets will be much greater than the other terms except when 7;(¢) is small, and then the 
Biometrika 41 5 
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contribution of the expression to the standard error is itself small. Hence to a reasonable 
ee o*[up(e)] +B THe) o%(a,). (13) 
j 
In the region of extrapolation the error o*(a,) becomes dominant, and the efficiency of 


the fitted value approaches the efficiency of the coefficient of highest degree. In the region 
of interpolation the efficiency will be somewhat greater than this value. 


(4) Estimation of the standard error of an observation 
The standard error of an observation may be estimated in several ways. Considering first 
the residuals V,(Z) defined as V,(Z) = Y(E)— DA, T,y(B), 
7 

it follows from least-squares theory that 

x V5(Z) = & Y*(#)- > AFS Tiy(Z). (14) 

rE E i Ss 
Hence exp (2 V5(Z)] = No*(Y)—S¥ {S73 y(£)} 07(A,;) 

ji £E 


= (N-p-1)0%Y) 
= r(N—p—l1)o*(y), 
Pm eet 1 
and the expression S, = i a 
will provide an estimate of the standard error of an observation. 
Another estimate may be obtained from the residuals v,(¢) defined as 


v,(€) = y(e) * p> a;T;,(€). 
] 
It can be shown (Guest, 1952, p. 255) that 
exp [Z v5(e)] = [n»—(p+1) +> {(1/9;) — YJ o*(y). 
7 





(15) 


From equation (12), the last term in the square brackets is found to be approximately 
p*/12N?, and so it can be neglected. Hence 
Ev (e\ )t 
aged Se) (16) 
n—p—1l 
will provide an estimate of the standard error of an observation. 

An indication of the relative efficiencies of the estimates s,, S, may be obtained by 
assuming that the residuals v, are uncorrelated and of standard error o(y), and that the 
quantities V,/r? are also uncorrelated and of standard error o(y). Both these assumptions 
are only very approximately true. Then, comparing (15) and (16), it will be seen that 


(s.z.8,)? _, N 





That is, the relative efficiency of the estimate S,, is 1/r. This estimate is very easily calculated, 

using equation (14), but is rather inefficient if r is large. On the other hand, there is no 

formula comparable with (14) by means of which s,, can be calculated, although the value 
Ly? — LaF UT 47(e) 


could be used to give a rough approximation to Xv? if the efficiencies of the coefficients 
a; are high. 


obs = = 
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able 
(13) (5) The dropping of observations before grouping 

If the number of observations n is prime, it will be necessary to drop one or more obser- 
y of vations at the ends of the range before grouping. The standard error of the least-squares 
zion coefficient a; is proportional to n~%+#, Hence if the number of observations is reduced to 


n’ by dropping v observations before grouping, the efficiency of the estimate a; will be 
reduced by the factor (n’/n)*/+1. The efficiency of the grouping estimate is then the expression 
in equation (12) multiplied by the above factor. This factor may cause a considerable drop 

first in efficiency. For example, if one observation is omitted when n is 50, the efficiency of a, 
is reduced by the factor 0-868. Table 3 shows the relative efficiencies for various values of 
n of the coefficients a, when 1, 2 and 3 observations are omitted. 







































































(14) Table 3. Relative efficiencies of a; when v observations are dropped 
from the original set of n observations 
ay ay, a, 
v 1 2 3 1 2 3 1 2 3 
(15) 
n 
20 | 0-950 | 0-900 | 0-850 | 0-857 | 0-729 | 0-614 | 0-774 | 0-590 | 0-444 
30 | 0-967 | 0-933 | 0-900 | 0-903 | 0-813 | 0-729 | 0-844 | 0-708 | 0-590 
40 0-975 0-950 | 0-925 | 0-927 0-857 0-791 0-881 | 0-774 0-677 
50 0-980 0-960 | 0-940 0-941 |, 0-885 0-831 0-904 | 0-815 0-734 
75 0-987 0-973 0-960 0-961 | 0-922 | 0-885 0-935 0-874 0-815 
100 0-990 0-980 0-970 0-970 0-941 0-913 0-951 0-904 0-859 
150 0-993 0-987 0-980 0-980 0-961 0-941 0-967 | 0-935 0-904 
250 0-996 0-992 | 0-988 | 0-988 0-976 0-964 | 0-980 | 0-961 0-941 
500 | 0-998 0-996 | 0-994 0-994 0-988 0-982 | 0-990 | 0-980 0-970 
| 
ely l 
2 | ‘ ‘ 
| 
16) y 1 7 2 oe | 2 3 1 2 3 
we 
by n | | 
h 20 | 0-698 | 0-478 | 0-321 | 0-630 | 0-387 | 0-232 | 0-569 | 0-314 | 0-167 
the 30 | 0-789 | 0-617 | 0-478 | 0-737 | 0-537 | 0-387 | 0-689 | 0-468 | 0-314 
ons 40 | 0-838 | 0-698 | 0-579 | 0-796 | 0-630 | 0-496 | 0-757 | 0-569 | 0-424 
50 | 0-868 | 0-751 | 0-648 | 0-834 | 0-693 | 0573 | 0-801 | 0-638 | 0-506 
75 0-910 0-828 0-751 | 0-886 | 0-784 0-693 0-863 0-743 0-638 
100 0-932 0-868 0-808 | 0-914 | 0-834 | 0-760 0-895 0-801 0-715 
150 0-954 0-910 0-868 | 0-942 0-886 0-834 0-929 0-863 0-801 
250 | 0-972 | 0-945 | 0-919 | 0-965 | 0-930 | 0-897 | 0-957 | 0-915 | 0-876 
d 500 0-986 0-972 | 0-959 | 0-982 | 0-965 0-947 0-978 0-957 0-936 
e 
, | 
no 
lue | The effect on the fitted values of dropping observations can be found by means of tables 
(Guest, 1950) of the functions p,.(k), where 
nts k = 2e/n 


and o7[uy(k)] = n—pio(k) o*(y). 
5-2 
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The efficiency of the fitted value at the point ¢ will be given by 
pluy(€)] = o7[u,(k)]/o7[u,(k’)] 
_@ (eae ) 
1 \ Ppl k’) . 


where (assuming the observations are omitted symmetrically from the two ends of the 
range) 





k’ = 2e/n’, v even; 
k’ = (2e+1)/n’, v odd. 

Tables and graphs of py are given in the reference quoted. It is found that p(k) varies 
only slowly with k for a considerable part of the region of interpolation, and so in this part 


the efficiency of the fitted value is close to n’/n, the efficiency of the coefficient a). Beyond 
a value of || given in Table 4, the variation of p,9(4) with k becomes very rapid and the 


efficiency drops sharply. For large values of | k| the efficiency approaches that for the ' 


coefficient a. 


Table 4. Range of | k| within which the efficiency of the fitted value when 
observations are omitted approximates to n'/n 








| 
Degree p of | 
polynomial | Range of | k| 
2 0 to 0-50 
3 0 to 0-70 
4 0 to 0-80 
5 0 to 0-85 














Part II. Step-FUNCTION METHODS 
(1) The straight line 


If w,(e) is any function for which Dw, (e) = 0, (1) 
then the equation Xw,(e) [y(€) —6,,€] = 0 
will provide an estimate by, = Lw,(e) y(e)/Xw,(e) € (2) 


of the slope of the straight line which fits the observations. The standard error of this 
estimate is found from o%y) _ {Ew (e)e}® 


o%(b,,) wile) (8) 





The functions w,(e) which will be discussed in this part are the step functions. Step func- 
tions are functions which are constant in magnitude over specified ranges of the variable e, 
the magnitude usually being different in different ranges. From equation (1) w,(e) will be 
an odd function of ¢, and so, for any arbitrary function f(e), Xw,(e) f(€) will be of the form 

a(n—1) = #(4,—1)) #(4,—1) 4(a,—1) #(dm—1—-1)  4(@m—1) 
jas - = )+a( s - & )t-ta( 2% - & )|Me-L- 
0,4 0,4 0,4 0,4 0,4 0,4 
the numbers }(a;— 1) being the values of € at the ends of the steps. 


the 
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The number of steps, counting those corresponding to both positive and negative values 
of ¢ and the central step of zero weight, is 2m+1. It is required to find, for any given m, 
the best values of the parameters a; locating the steps and of the corresponding weights q;. 

The expression Xw,(e) ¢ will consist of terms of the form 


4(aj—1) 
2 > €=(a}-1)/4. 
0,4 
If a; = a,/n, (4) 
2 
then Ew,(e)€ = [ay(1 — a) + 9903 — 08) +... + day -1— 2) (5) 
Similarly Dw¥(e) = m[q}(1 — a4) + g3(oey— ag) +... + Gaya — A) (6) 


Thus substituting these expressions in equation (3) and using the corresponding 
expression n(n? —1)/12 = 73/12 
for the least-squares estimate, the efficiency of the estimate b,, is found tobe 


— 3[qi(1 — of) + g(t — 29) + --- + 9m (Wn—1— Hn) 
4 gi(l— ay) +93(& — %) +... + 9in(%m—1 — Xm) 





(5,3) (7) 


Let the expression on the right be written as }(N?/D). To maximize 7(b,,), the fraction 
N?/D is differentiated with respect to the q;, «;, and the resultant expressions equated to 
zero. On differentiating with respect to q,, 


It is clear that the weights g; may be multiplied by any arbitrary factor without altering 
9(b,;). It is convenient to choose the q; so that (N/D) is unity. Then 


a; s+a,= 9; (J =2 to mii (8) 
1+a,= 4%. 
On differentiating with respect to «,, 
+41 = 40; (J=1 to mon (9) 
Vn = 42m; 
Combining equations (8) and (9), @_4 +441 = 2a,, 
or a; ,—%;= Aa, a constant, 
= 1-4. 
But Ln-1+%m = Im = 4m; 
and so Aa = 2a,,. 
Thus hm = &,—(m—1) Aa = 1—mAa 
and Om = 1/(2m+1). 
Hence a; = asada (10) 
and Qj = %_1~+Q; 
_ (m—j+1) (11) 


2m+1 
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The numerator N can be written as 


™ 
N= 2 Yn—n41(%n—K re OF eat)> 


and so, using (10) and (11), 
N = > 4k{(2k + 1)? — (2k — 1)%}/(2m + 1)8 
k 


= 32Dk2/(2m +1) 
__ 32m(m + 1) 
~ 6(2m +1)? ° 
Thus, since VN = D, the efficiency of the estimate 5,, is 


1 
= = | —-—___... 12 
Hence for three steps the efficiency is 0-89, for five steps 0-96, and for seven steps 0-98. 
Finally, it is convenient to remove the factor 4/(2m + 1) from the weights, leaving 





q; = (m—j+1). (13) 
Thus for maximum efficiency the weights are m, m—1, ..., 1, and the values of the para- 
meters locating the steps are 
2m—1 2m—3 - 1 
2n+1’ Im+1’ -*” -2m+1° 


The number of observations in each step is 


$(4;_,—1)—4(a;—1) = $n(a;_,—a;) = n/(2m+ 1), 
and the observations are divided uniformly among the 2m + | steps. 
If the number of observations 7 is not an integral multiple of the number of steps, but 


n=(2m+1)R+p (|p|<m), (14) 


then the number of observations in some of the steps will be R + 1 instead of R. The methods 
of choosing the number n, of observations in the step of weight j to give the maximum 
efficiency are listed in Table 5, together with the corresponding formulae for Xw,(e)e. For 
example, if n = 66 with 9 steps, the number of observations in the steps of largest || is 
nm, = 7, while n, = 8, n, = 7,n, = 7, % = 8. In forming Lw,(e) y(¢), the sum of the n; observa- 
tions in the step of weight j corresponding to negative values of ¢ is subtracted from the 
sum of the n, observations in the step of weight j corresponding to positive values of e. 
and the difference is multiplied by the weight j. The value of b,, is then given by equation (2). 


(2) Polynomials of higher degree 
(a) Second degree polynomials 


To obtain an estimate 6,, of the second degree coefficient a function w,(e) is required 


such that S1,(¢) = 0, (15) 


Lw,(e)e = 0. (16) 
The estimate is then bog = Lw,(e) y(e)/Xw,(e) €?, (17) 





) 
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and its standard error is found from 


oy) _ {2w,(e) €*} 
T%(boa) — Lwi(e) * 





(18) 


Condition (16) is satisfied if w,(e) is an even function of ¢. Condition (15) requires that the 
values w,(¢) should fall into two groups, one group in which w,(e) is positive and the other 
group in which w,(e) is negative. Accordingly, the appropriate step functions will be such 
that, for any arbitrary function f(¢), Xw,(e) f(e) is of the form 


#(n—1) +(a,—1) $(a,—1) $(a,—1) | 
lln( = - = )+a( S -"S )+... 
—i(n—1) —#(a,—1) —#(@—1) —3{a,—1)/ 
4(d,—1) +(b;—1) 4(b.—1) #(b,:—1) \ 
-(A(CS-"S" )en(8S - "SE )+.|] Ye 
—#(0;,-1) —4(t.—1) —%(b,-1) —4(b:—1) 


The functions will be called single-step functions if there is one step in each group, double- 
step functions if there are two steps. 


Table 5. Step functions for the fitting of straight lines 
(n;: number of observations in step of weight 7. n = (2m+1)R+ : total number of observations.) 


Three steps:n = 3R+p 








Five steps:n = 5R+p 
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9(6,,) = 0-960. 
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| | | | ei es ol 
(b,,) = 0-980. 
Nine steps:n = 9R+p 

p | N4 Ns Ne nN; No Xw,(e) € 
aaa | es : 
a aa R R R 60R? 
Lot tal titan & R R R+1 60R* + 10R 

+2 | R | R R R+1 R 60R?+21R+1 
: ae a 2 tl eee R R R+1 60R? + 39R+6 
| +4 | R R+1 R Rtl R 60R? + 50R+ 10 
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The expression =w,(e) €? will contain terms 
p 2 
4(aj—1) . 
é? = a,(aj —1)/12, 
—4(aj—1) 


for which the approximation n*a3/12 will be used, where «, is a,/n. Hence 
ns P 
Xw,(e) €? = 13 [{qx(1 — a3) + qo(a? — a3) + ...}-- {r,(f3 — 63) +...+7,,83,}1, 


Lw3(e) = m [{9i(1 — oy) + 95(%1 — Xy) + ...} + {r7(Bi — Be) +... + 17 Bm}]- 
Substituting these expressions in equation (18) and using the corresponding expression 
n(n* — 1) (n? — 4)/180 = n5/180 

for the least-squares estimate, the efficiency of the estimate 6,, is found to be 

4 {qi(1—a)+...$+{7i(B,—fa)t+...} ’ 
with the conditional equation 

Dwe(é) = {9,(1— a) + ...}— {r74(2, — Ba) + ...} = 0. (20) 

The values q, «, 7, 8, for maximum efficiency, were found by the method of undetermined 
multipliers. The resulting equations are simultaneous quadratics, and the values of the 
parameters were obtained by successive approximations. The solutions for single-step 
and double-step functions are listed in Table 6. 


The maximum efficiency of the estimate b,, is found to be 0-8958 for single-step functions 
and 0-9630 for double-step functions. 





(B99) 


Table 6. Values of parameters for maximum efficiencies 




















Second degree coefficient Third degree coefficient 
Single-step Double-step Single-step Double-step 
functions functions functions functions 
a  0-7363 a, 0-8482 a 08621 a, 0-9222 

a, 06757 a, 00-8309 

B 00-4414 B, 0-5007 B, 0-7024 B, 0-7399 
Bz, 09-3207 fA, 09-1205 B, 09-6536 

B, 0-1265 

q 1-3042 q, 1-5774 q 1-0393 qi, 1°2465 
G2 0°7590 dQ, 0°5940 

r 0-7793 r, 0-4762 r 00-5573 r, 0-3167 
r, 0-8875 r, 0-5919 

N, 0°8958 N, 0-9630 Ns 99014 Ns 09473 

Se ee Oe hae E netics 











(6) Third degree polynomials 
To obtain an estimate b,, of the third degree coefficient a function w,(e) is required such 
that Xw,(€e) = XLw,(e) e? = 0, (21) 
Xw,(e)€ = 0. (22) 


The estimate is then bs, = Xw,(e) y(e)/Xw,(e) €°, (23) 
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and its standard error is found from 
oy) _ {Zw,(e) €2}* 
(bsg) Lge) 
Condition (21) requires that w,(¢€) be an odd function of e. Proceeding as in §(a), the 
efficiency of the estimate is found to be given by the expression 
175 [{q,(1 — of) + oF } wk, {r,(At — $3) +...+ Tm( Pon ea 4 d}P (25) 
64 {gi(1—a,)+...}+ {ri(Bi— Be) +... +7(Bm—Pmis)} ’ 
with the conditional equation 
Xw,(e) € = {¢,(1 — af) +...}—{r,(67 — 63) +...} = 0. (26) 


The maximum efficiencies for single-step functions and for double-step functions are 
found to occur at the values of the parameters listed in Table 6. The maximum efficiency 
for the estimate b,, is 0-9014 for single-step functions and 0-9473 for double-step functions. 





(24) 





9(bg3) = 


(3) Polynomial coefficients 
The estimate of the coefficient of ¢/ in the power series exparision of the polynomial of 
degree p will be denoted by 6,;. b,; can be expressed as a linear function of the quantities 


bux, = Dj) y(€)/Zw,(e) e*, 
where Xw,(e)e™= 0 (k>m). 
For if the estimates are to be unbiased 


Ew,(e) [y(e) — Eb, ,€*] = 0, 


and hence Lw,(e) y(e) = Lw,(e) = bp, e*. 

Dividing by Ew,(e) /, by = bp & ayy Dons 

where Ops = — Lw,(e) e*/Xw,(e) €?. (27) 
Hence by; = ps Bj Ones (28) 
a a Bs = Xs + 3 tins 


(29) 
Px = 1, 
and f,,; vanishes if j + k is odd. For the second and third degree polynomials 
bao, 539 = boo +Aaob22, 5s2 = See, 
ba, = by, + P1533, Fe = Our, 
where Boy = — Xe?/n = —(n?—1)/12, 
Ps, = — Xw,(€) €*/2w0,(€) €. 
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(4) The fitted values 
The fitted polynomial of the third degree is 
Us(€) = b5o + O51 € + O26" + byge° 
= bo9 + by1 € + boq(€? + Boo) + b33(€* + 316). 
The coefficients 6;; are independent, except for b,, and b,,, where 
COV (54; b33)/o7(y) = Lw,(€) ws(€)/Lw,(€) €Xw,(€) €*. 
Hence the standard error of the fitted value is obtained from 
o*[us(€)] = o7(bgg) + €?07(b,,) + (€* + Bao)? 7*(ba9) + (€* + B51 €)* 77(by3) 
+ 2(€4 + B5,€") cov (6; 533). 
The standard error of the fitted value for polynomials of lower degree is obtained by merely 
dropping the higher degree terms. 


It is convenient to introduce the variable k = 2e/n. Then, neglecting terms of order n-* 
and writing 7; for 4(6;;), 





2 ¢ 5 7F > 
oust) me ! + = it + ie (3k? —1)2+ is, (k3 + 485, n-*k)? +4 (k4 + 48, nk?) ee ; 
(30) 
By calculating the value of this expression for the least-squares and step-function poly- 
nomials, the efficiencies of the fitted values may be obtained for various k. 
The efficiencies of the fitted values with single-step and double-step functions are shown 
as functions of k for polynomials of the first, second and third degrees in Table 7. 


Table 7. Efficiencies of the fitted values using step-functions (k = 2e/n) 











Degree p=! p=2 p=3 
|k| Single-step | Double-step | Single-step | Double-step | Single-step | Double-step 
0 1-000 1-000 0-939 0-979 0-939 0-979 
0-2 0-987 0-996 0-942 0-980 0-941 0-973 
0-4 0-961 0-987 0-948 0-982 0-948 0-967 
0-6 0-939 0-979 0-939 0-979 0-956 0-970 
0-8 0-924 0-973 0-916 0-971 0-919 0-972 
1-0 0-914 0-970 0-904 0-966 0-887 0-959 
1-2 0-908 0-967 0-899 0-964 0-888 0-953 
1-4 0-903 0-966 0°897 0-963 03891 . 0-950 
1-6 0-900 0-964 0-897 0-963 0-894 0-949 





























(5) Tabulation of step functions 


In a previous paper (Guest, 1951) the optimum single-step functions have been listed 
for the range p = 1 (1) 5,n = 7 (1) 55. These tables were obtained by calculating the efficiencies 
for chosen arrangements of groups at each value of n and selecting the method of grouping 
which gave maximum efficiency. In point of fact such a procedure is unnecessary except 
when 7 is small. The efficiency decreases only slowly as the arrangement of groups departs 


mim 


ate = 
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from the optimum arrangement, and it is quite satisfactory to use the general parameters 
a to determine the number of observations in each group. 

In tabulating the double-step functions the location of the groups can be obtained from 
the parameters a, 8. The corresponding weights qg, r are not exact multiples of one another 
and so the optimum weights will not be integers. It is therefore desirable to take values 
slightly different from the optimum weights. The recommended weights are: 


2nd degree: g,=10w, g,=5w, 1, = 3w, 

W, 1, integers adjusted so that Xw,(e) = 0; 
3rd degree: g,=4w, q,=2w, 7, =, 

W, 1, integers adjusted so that Xw,(e)¢ = 0. 


Then 7, = 0-9628, 7, = 0-9471, and the drop in efficiency is negligible in each case. By this 
choice of weights the number of weights to be tabulated is reduced to two (w and r,). 

Single-step functions give efficiencies of about 0-90. With double-step functions, for a 
second degree polynomial the efficiencies of all quantities are greater than 0-96. For a third 
degree polynomial the efficiency of the fitted value is greater than 0-96 in the region of 
interpolation. The efficiency of the fitted value in the region of extrapolation drops to 
about 0-945, the efficiency of the coefficient 6... 

Tables of both single-step and double-step functions have been prepared for values of n 
up to 100, the single-step functions up to the fifth degree and the double-step functions up 


to the third degree. These tables are too long to be reproduced here, but copies may be 
obtained from the author. 


CONCLUSION 


An attempt has been made to assess the saving in time that would eventuate in practical 
examples from the use of grouping methods. The example chosen for the test was that 
dealing with coded sugar prices which was used by Anderson & Houseman (1942). Table 8 
gives a summary of the solutions obtained by the various methods, and the time taken 
using each method (including the time required for the checking of all calculations). The 
calculations were all carried out by a computor familiar with the methods, using an electric 
machine equipped with semi-automatic multiplication. 


Table 8. Solutions of example on coded sugar prices obtained by different methods 





Least-squares methods Step-function methods 





62 observations 31 groups 15 groups | Single-step | Double-step 





— bs, x 108 


2-59 + 0-32 2-59 2-95 2-51 2-44 
bs. x 102 2-86 + 0-50 2-85 2-92 2-92 2-84 
bs, x 10 9-8 +20 9-8 12-4 10-1 8-7 
i. 12-4 42-2 12-4 11-7 12-2 12-4 


Minutes required: 
| By machine 
By logarithms — —_ 








ee 
wo 
wo 
_ 
to 
© 
-™ bo 
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& bv 
bo oO 
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An examination of the times required for the calculation of the coefficients shows that 
the saving in time through the use of grouping methods is not as large as might be expected. 
The reduction in the strain on the computor is more significant than might be deduced 
from the times shown, because with grouping methods the multiplying factors are both 
smaller in number and smaller in size, and the chance of making an error is much reduced. 
In fact the step-function methods are almost foolproof, except for the possibility of copying 
errors. The saving in time becomes much more pronounced if computational facilities are 
limited—for instance, if only a hand-operated machine is available. Step-function methods 
can be used even when no machine is available, the quantities Xw,(€) y(¢) being obtained 
by simple addition and the coefficients b,,, by the use of logarithms. 

It will be seen that the values obtained by using groups of two are the same as the least- 
squares values for the original sixty-two observations. However, if the observations are 
grouped in fours, the two end observations must be omitted and the values are noticeably 
different. The fact that the deviations for the coefficients b,,, b,, exceed the standard errors 
indicates that the variations from a smooth curve are not random, as is apparent from the 
nature of the data. The deviations for the step-function solutions are all less than the 
standard errors. 

The weakness of the step-function methods lies in the difficulty of estimating the standard 
errors, and of deciding the degree of the polynomial to be used if this is not already known. 
This is becense there is no formula comparable with the 


rv, = ay eanty 


of the least-squares method. To estimate the standard error of an observation it is necessary 
to calculate the residuals individually. If the fitted values at the points of observation are 
also required the calculation of the residuals will not take much time, but otherwise the 
time spent in calculating the residuals will largely offset the time saved in the calculation of 
the coefficients 5,,;. 

In practice by far the largest number of curves to be fitted are of the first degree. Step- 
function methods of fitting straight lines are especially useful in undergraduate instruction 
where calculating machines are not usually available, and they may also be used to give 
a rapid independent check on the least-squares solution. The loss in accuracy compared 
with the least-squares method is quite small, and the use of single-step and double-step 
functions in teaching and in routine work is consequently strongly recommended. 

If the observations are to be grouped before fitting a least-squares curve it is important 
that the number of observations omitted at the ends of the range be very small. Tables 2 
and 3 may be used as a guide in selecting the number of groups to give a satisfactory value 
for the efficiency. 
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NEW TECHNIQUES FOR THE ANALYSIS OF 
ABSENTEEISM DATA 


By A. G. ARBOUS anp H. 8. SICHEL 


National Institute for Personnel Research, South African Council for Scientific 
and Industrial Research 


1. INTRODUCTION 


The shortage of competent man-power has in recent years accentuated the problem of 
absence in industry, and revived both managerial and scientific interest in the subject. To 
management these occurrences are a drain on the productivity of the concern which can 
no longer be dealt with by methods which, presumably, were effective when the supply of 
labour was abundant. They now turn for advice to the social scientist, who, looking deeper 
into the problem, is inclined to regard absenteeism as symtomatic of occupational casualty; 
undesirable for the well-being of the persons engaged in production. 

In view of this state of affairs it is surprising that the statistical tools for analysing and 
interpreting absence data should have remained relatively undeveloped. By contrast one 
compares the more thorough, though not necessarily more successful, attack which has 
been made in the field of accidents. And yet, as will be seen later, the two phenomena have 
much in common. In the past the analysis of absences has confined itself largely to the 
monotonous and often meaningless calculation of a series of ‘rates’ (frequency, severity 
and disability) for innumerable classifications of the data. These ‘rates’ are not amenable 
to statistical analysis, and their interpretation is consequently difficult. More recently 
authors such as Fox & Scott (1943) have departed from this approach by giving frequency 
distributions of absences and time lost, without attempting to give these any mathematical 
description. Further advance was made by Russell, Whitwell & Ryle (1947) and Sutherland 
& Whitwell (1948), who applied to absence distributions the models developed by Greenwood 
& Yule (1920) in the study of accidents, and claim thereby to have established the pheno- 
menon of absence-proneness. However, the inadequacy of inferences drawn from the 
fitting of theoretical univariate distributions to observed sets of data has been clearly 
demonstrated by Maritz (1950) and Arbous & Kerrich (1951), and the case would appear to 
be unsubstantiated. 

After the present study was completed we find that another investigator (Norris, 1951) 
had adopted the approach suggested by Lundberg (1940) and Maritz (1950), viz. that the 
existence of proneness in absence behaviour can only be established by the method of 
correlation. Unfortunately, apart from showing the existence of correlation, its physical 
significance was not demonstrated—a defect which we hope to remedy in this study. 

It is therefore the object of this paper to develop more precisely the mathematical model 
for describing the distribution of absences to a group of people in single- and double-exposure 
periods, thus allowing for both the univariate and bivariate case. We shall, moreover, con- 
sider the physical significance of the properties of such distributions (a) for the interpretation 
of research findings, with particular reference to the phenomenon of absence-proneness, and 
(6) for the development of practical techniques. For reasons, which cannot be elaborated 
here, we shall confine our observations to absences which occur (for various reasons) to 
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individuals, and not the duration of absence. One absence is thus defined as a period (of any 
length) of unbroken non-attendance at work. These absences can be grouped according to 
various causes, if it is desired. 


2. THE MATHEMATICAL MODEL 
(a) The univariate negative binomial 
Let the probability distribution of absences for an individual during an exposure period of 
length kt be represented by x 
prob. (a) = e~* — (1) 
Following the original suggestion of Greenwood & Yule (1920) we shall assume that in 
a group of persons the individual liabilities, as measured by the A’s, are distributed according 


oes dF = (2)’ 5 ePimA XP-1dX_ (0 <A < cH) (2) 
m} \'(p) 
where m and p are the unknown parameters of the law. For (2) we have 
E(A) = m, (3) 
m 
o,=—, 4 
AT (4) 
; 4 
B,(A) = x (5) 


The probability of finding a person in the group having x absences in one and the same 
exposure period of duration kt is given by 





f(x) = ey 5 Ay, dae ? \2tp-1e-ik+pimA dd (6) 
m) T(p)Te+l)Jo ' | 
On integration we find 
aie Oo dd gl D(x+p) / km y i 
S(%) = (7) I(p) T(x+1)\p+km) ’ (7) 


which is the negative binomial distribution with exponent p and mean km. For an observa- 
tional period of length ¢ we have k = 1 and the above distribution takes on the standard 
form. From equation (7) we deduce: 
(1) The mean number of absences incurred by a group of persons in an observational 
period is directly proportional to the length of that period, i.e. 
fy = km, (8) 
where m is the mean for duration ¢ and km the mean for duration kt. 
(2) From the moments of the negative binomial we find for the variance of absences in 
period kt »(m cat 
Pe e (E+). (9) 
(3) Parameter p is independent of the length of the exposure. 


From the first four moments of the standard negative binomial distribution (k = 1) we 
find the Pearsonian coefficients of skewness and kurtosis 


a (p+ 2m)? 
Ay = mp(p+m) (28) 
and pie Sanus |: (11) 


p m(p+m) 


™T] 


we 


10) 


1) 
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For p large and m finite ad ape r h 

Ato, Aye3+e, 
which are the respective shape coefficients of the Poisson distribution. For both p and 
m large, 8, +0, 8,+3, and under these conditions the negative binomial tends towards the 
normal distribution. 


(b) The symmetrical bivariate negative binomial 


If the existence or non-existence of absence-proneness in a group of individuals is to be 
established by correlational methods (Maritz, 1950) we must construct a bivariate model. 
The bivariate negative binomial distribution was first given by Lundberg (1940) and has 
again been derived recently by Arbous & Kerrich (1951). In the fielc of absenteeism it is 
convenient to split the total exposure period into two equal parts. In what follows we shall 
deduce the symmetrical bivariate negative binomial which is a specialized but simpler case 
of Lundberg’s distribution. On the other hand, we believe that some of the formulae 
pertaining to the arrays, the limiting conditions of correlation and the operating charac- 
teristic are new. 

Suppose that we deal with two equilong non-overlapping periods cf exposure. Each 
individual will have x absences in the first and y absences in the second period. Provided an 
individual’s absences in the second period are independent of those incurred in the first, 
and provided the individual liabilities to absence follow equation (2), we find for the 
bivariate symmetrical binomial 


Py" aa 3 ee {. v+e+p—1 p—(2+pim)A 9 
% 6.2) = (5) nayrgsnTesy), erent. we 
ter integration 


» fat Pil Dietete fm ee 
$0.2) = (som) ratysiterh (pr) e 


The marginal distributions of (13), their means and their variances are given by equations 
(7), (8) and (9) respectively with k = 1. The product-moment correlation of a symmetrical 
bivariate distribution is < wy — (ui)? 

He 


where /4;, is the product moment about the origin. This may be found from 





yx 


, ape / l 2 2 ly+e+ m \vtx 
a= 3, 3 veb-2)= (Fon) jin Totes (prim) OM 
Substitute y=y' +1, (15a) 
z= 2z'+1, ; (156) 
p=p -2, (15c) 
m= (:-=) m’, (15d) 
P, 


into the double sum on the right-hand side of equation (14) and we find for it 
m 2 0 wo r y +a’ + , m’ y’ +2’ , m 2 : , ‘4. Om! p’ 
(m_\ = § Tultaeg (_m er _ (im \ryyy (atm) 
+2m]} yo x=0T(y’ +1) T(x’ +1) \p’'+2m p+2m p 


omit 5) ES) 
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Finally, by substitution of the last expression into equation (14) 


, ‘pti 
fu = fear m* (16) 
Pp 
m 
and Pyz c= ptm’ (17) 


From (17) we arrive at the important conclusion that, given a phenomenon which may be 
described by a symmetrical bivariate negative binomial, the product-moment correlation 
may be expressed in terms of the parameters of the univariate marginal distribution. This 
result was already known to Newbold (1927). 
If we interpret p,,,as a reliability coefficient of observed absencesin one period of duration 
and we lengthen this period to kt we step up the reliability to 
km 


p+ km’ si 


Pyz,kt = 


in view of equation (7). We could have deduced this result from the general Spearman- 


Brown reliability formula kp 
vr. penal 3 MS 
Puck = V+ (k— 1) pgs’ 


because substitution of (17) into the above gives again (18). 
We now derive the probability distributions of arrays for x = 0,1, 2,...,00 


; y |x 
y{y|x) = =, 
y| F(a) 


from which we find the array distribution 





m \*+P l C(y+x+p) ™m )’ 
r(y|*) | p+2m Tia+p) T(y+1) oe 2m, (19) 
Substitution of r+p=p", (20a) 
eo ea (206) 
p"—m 
ae bes _1_ T(y+p")(_m" )’ 
into (19) gives y(y|x) _ (AS I'(p’) T(y+ 1) (aa ? (21) 


which is a standard negative binomial with mean and variance (in the original parameters) 


Pa. ase, * oy (2 +P) = pete, 
(p+2m)m 
2. are. = ——— (2 +p) = (1 
Ma. are. = “Coe mye (+P) = (1 +0) (P% + Pp) 
Hence for the regression y on x J = px+pp, (22) 
and the variance within an array 
Me. arr, = (1 +P) M3, arr.» (23) 


From the last two equations, it follows that the regressions y on x and for reason of 
symmetry also x on y, are linear but heteroscedastic. The variances within arrays are 
increasing linearly with the array means. The slope constant of the regression lines is equal 
to the product-moment correlation p. 
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It is interesting to note that for the limiting correlation p->1 the bivariate negative 
binomial distribution does not tend towards a concentration along the regression line as is 
the case with the bivariate normal distribution. In a strict mathematical sense the limiting 
process may be obtained in two ways: 


(1) limp = lim ED wine : for finite m. 
po p>op+rm 


(2) lim p = lim <—s =1, for finite p. 


m-—-> @o m-—-> @ 


In the first case the regression line will tend towards 


yj = 2, (24) 
but the variance within arrays towards 
He, arr, = 2%, (25) 
which is far from zero spread. The vast majority of observations will be concentrated in the 
(0,0) cell with the remainder thinly spread over a wide range of absences. 
In the second case the regression line will tend towards 


9 =X+p. (26) 


The slope of this line (45°) is again the same as in the first case, but a constant has to be 
added. The array variance becomes 

Me, Arr. = 2x + 2p, (27) 
which is in fact maximal for p = 1. 

From a practical point of view the distinction between the two limiting processes is of 
little consequence with respect to the regression lines. As m-—>oo the whole bivariate dis- 
tribution will shift away from the (0, 0) corner, and thus to get it on to a frequency table for 
inspection, we must reduce the length on the paper represented by a unit of x or y; e.g. by 
dividing all observations by m. In this case the difference between equations (24) and (26) 
would be negligible as p/m would be exceedingly small. 

On the other hand, in practice the difference between the two limiting processes may be 
significant, depending upon whether one looks at array variability from the absolute or 
relative point of view. In the former case, which will be needed when predicting future 
absences for a given person, as p -> 0 and consequently p > 1, the standard error of prediction 
for a fixed x will first increase, pass through a maximum and then decrease. For the second 
limiting process, where m-—>oo and p->1, the standard error of prediction for fixed x will 
monotonically increase reaching a maximum at p = 1. This result is unusual in the sense 
that when predicting future absences for two different persons, with the same x and exposure 
periods, from two different groups, the standard error of prediction is smaller in respect 
of the one who comes from a lower correlation surface, and vice versa. 

In the latter case, when considering relative variability, the behaviour of the bivariate 
negative binomial surface does not seem to be anomalous. In this case the value of the 
correlation from the point of view of prediction may be expressed as the ratio: 

Array variance _p(p+2m)(x+>p) 


Marginal variance © (p +m) 


In both limiting processes, i.e. p> 0 and m— oo, this ratio will tend to zero. 
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Going back to the array distribution (21) and utilizing equations (10) and (11) we find 
for the shape coefficients of the array distributions after substitution of (20a) and (206) 


B _ ___ (p+3m)? 
1. Arr." m( p+ 2m) (x+p)’ 





(28) 


?¢ » (p+m)* 
z+p m(p+2m)(x+p) 





Be. arr. = 3+ (29) 
For a large and m and p finite we find £;, 4... =, Bo, arr. = 3, Which meane that irrespective of 
the magnitudes of the parameters of the bivariate negative binomial the array distributions 
will tend towards normality for large x. This is well in accordance with observations where 
we usually have the first array distributions very skew bell- or J-shaped, passing into skew 
bell-shaped for moderate x and finally approaching normality for the larger x. 

For poo equations (28) and (29) become 


1 1 
pall =3+—. 
By, arr. po Be, arr. re 


These are the shape coefficients of the Poisson distribution, and as the x’s have dropped out 
we conclude that the shapes of all array distributions are alike. In such a case our bivariate 
negative binomial has deteriorated into an uncorrelated bivariate Poisson (with p = 0); 
hence this is the case of non-proneness. 

From the above we may derive a quick but rough rule of thumb method for the detection 
of proneness from a scattergram. If there is a shape transformation of observed array distri- 
butions from the lower x’s to the higher ones we have proneness. If not, then we deal with 
a phenomenon generated by chance alone without proneness as a causal factor. 

From the foregoing it is clear that the product-moment correlation p of the bivariate 
negative binomial cannot be interpreted in the same way as a correlation coefficient of the 
same magnitude derived from a normal bivariate distribution. However, it is still possible 
to make some useful predictions based on the bivariate negative binomial. In a certain class 
of selection problems it is required to forecast on the strength of an observed value x whether 
an individual will belong to one of two classes on the variable y. In such a situation it is more 
meaningful to work with ‘operating characteristics’ instead of a single correlation coefficient 
(Sichel, 1952; Arbous, 1953). Furthermore, without altering their meaning, the principles 
of operating characteristics may be applied to all bivariate distributions irrespective of 
their analytical form. 

For our special purpose the operating characteristic of absences is defined as the probability 
of a person with x absences in the predictor period having / or more absences in the follow-up 
interval. This may be stated as 


prob. (y>f|x) = 1—prob. (y< lz), 


m \*+P J 4-1T(y+x2+p)( m \¥ 
.(y> =1-—(1- - # 
prob. (y > A|z) ( som I'(x+p) ~ (y+ 1) (Fon) (30) 





3. TESTING THE ADMISSIBILITY OF THE WORKING MODEL 


The existence or non-existence of proneness in an observed group of individuals cannot be 
established sufficiently by simply fitting univariate distributions to the observations. The 
authors agreé with Maritz (1950) that correlational (bivariate) analysis is much more 
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meaningful than ordinary curve fitting. Maritz’s point of view has recently been queried 
by Blum & Mintz (1951). However, it appears to us that univariate analysis sacrifices 
information contained in the original data. If an individual has a total of z absences in an 
observational period made up of x occurrences in the first and y in the second half, curve 
fitting will make use of z only, whereas the correlation approach utilizes the information 
contained in both x and y. In other words, to test the hypothesis of proneness we anticipate 
that the correlation approach will be statistically more powerful (in the Neyman-Pearson 
sense) than the univariate method. As a practical test for the establishment of proneness we 
therefore recommend: 

Split the total observational period into two halves and correlate by product- 
moment method the absences of individuals in the first period with their respective absences 
in the second. If we do not know the underlying bivariate distribution which generates the 
correlation coefficient we have no means of testing for significance. For this reason we should 
look at low coefficients as rather doubtful in establishing proneness. Even if we have found 
a large coefficient we cannot do much with it when it comes to the down-to-earth question 
of predicting absences in the future. For the solution of this problem we must have a model 
of the bivariate distribution of absences in two equal non-overlapping periods. 

If our initial assumptions are correct, then the absences in a single observational period 
should follow the univariate negative binomial and, therefore, the absences in two equilong, 
non-overlapping periods may be described by the corresponding symmetrical bivariate 
distribution. 

Scores of investigators have pointed out before us that the univariate negative binomial 
distribution may be generated by entirely different, often opposing assumptions. A satis- 
factory fit of observed distributions to theoretical negative binomials, as evidenced by the 
x*-test, means little if we must interpret our data physically. 

The Greenwood-Yule model, which we have adapted to absenteeism in this paper, rests 
on the following premises: 

(1) The number of absences incurred by an individual in repeated exposure periods all 
of equal length ¢ follow a Poisson distribution with mean A. (Our equation (1) if k = 1.) 

(2) The mean parameter A is invariant in time. 

(3) The individual A’s in a group of persons observed for one period of length ¢ follow 
a Pearson Type III law. (Our equation (2).) 

Over shorter exposure periods (1) and (2) will often be met. For longer durations (2) is the 
most exacting of the required conditions. 

We may write evden 
where A, = mean number of absences due to environmental factors, A, = mean number of 
absences due to individual proneness, A = total mean number of absences. It is clear that 
A will change in time if an individual’s environment is altered, i.e. in the event of promotion, 
better working conditions, different type of work, etc. For this reason one tries to keep 
environmental conditions as homogeneous as is possible in a modern industrial set-up when 
studying proneness in a group of workers. Even A,, which is person-centred, may vary, as 
human beings have the habit of learning and adapting themselves if exposed to one and the 
same situation for any length of time. : 

In practice it will be found that the x?-test is not powerful enough to detect time variations 
in the mean parameter A. Due to the small number of observations in intra-personal 
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investigations (usually of the order 50) the Poisson hypothesis will rarely be rejected. The 
same argument applies to the Poisson index of dispersion. 

Assumption (3) is usually met because a Pearson Type III law is extraordinarily flexible 
and can represent J-shaped, skew bell-shaped and normal distributions. 

As already mentioned we do not expect conditions (1) and (2) to hold exactly in a real life 
situation. If we assume that the departure from the Poisson hypothesis for individuals is 
not of such a nature as to falsify completely the assumptions in a given time period, we may 
proceed to work with our model (and it is for this reason that we prefer to call it a working 
model). 

We shall fit a symmetrical bivariate negative binomial to our data. A x?-test on the 
bivariate model, a graphical test for linearity of regression and a comparison between the 
actual product-moment correlation r and p as calculated from the estimates of the 
parameters of the univariate negative binomial (total exposure period) will suffice to 
indicate the plausibility of the working model. 

In Table 1* the absences of 248 shift workers from the same division of a large industrial 
plant have been recorded for a two-yearly period (1947-8). The product-moment correlation 
is r = 0-728, 
which indicates absence proneness in the group. 

In fitting a symmetrical bivariate negative binomial (equation (13)) to the observed 
absences of Table 1 we must estimate parameters m and p, where m is the mean of the 
marginal distributions. This can be done by estimating p and M = 2m of the univariate 
negative binomial distribution for the total period of exposure. 

Formulae, graphs and tables for the estimation of the parameters of a negative binomial 
distribution were previously given both for the method of moments and the maximum- 
likelihood solution (Sichel, 1951). Entering the efficiency graphs of this paper with our 
moment estimates M=92 and p=1-52, 
we find Eff. (p) = 0-6, 
hence we proceed to the maximum-likelihood estimation which yields 

Mm = 4M = 4-59072, Progriag = 1-58542. 
Substitution of these values into (13), and evaluation of the theoretical cell frequencies, 
leads to the expected figures shown in parentheses in Table 1. A y?-test comparing observed 
with expected frequencies of the symmetrical bivariate negative binomial resulted, for 
13 degrees of freedom and y? = 17-0, in P = 0-20. On this basis we are unable to reject the 
hypothesis. 

Using the maximum-likelihood estimates we find from equation (17) 


p = 0-743, 
which compares satisfactorily with the previously found product-moment correlation 
r = 0-728. 


We should, however, point out that we have repeatedly observed r to be slightly smaller 
than / if the exposure periods are fairly long. Similar observations were made in the field 
of accidents by Newbold (1927). The reason for this is not quite clear at present. 
Substitution of f and } into equation (22) gives the regression line 
7 = 0-743a + 1-178. 


* This is a folding table facing p. 90. 
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Although the observed means follow a linear regression, as required by our model, 
closer scrutiny of the graph reveals that out of eighteen array means shown, twelve lie 
below the calculated regression line and only six above. Therefore we have a tendency for 
the observed number of absences in 1948 to be slightly less than predicted on the basis of 


the observed absences in 1947. This fact is also borne out by the observed means of our 
marginal distributions. We have 


1947:; % = 4-70564 (n = 248), 
1948: y= 4-47581 (n = 248). 


For large samples such as we are dealing with, a statistical test of the difference between 
correlated means of negative binomial distributions is available. When applied to our data 


it was found that the hypothesis of chance fluctuation in the observed means could not be 
rejected. 


4. THE APPLICATION OF THE WORKING MODEL 


If we consider / or more absences per exposure period as worse than tolerable, we certainly 
shall wish to single out all men who are likely to have # or more absences in any future 
period. We cannot do this with certainty, even for correlations approaching unity, as provedin 
the mathematical deductions. On the other hand, the operating characteristic (equation (30)) 
will tell us with what probability we are able to spot future absence offenders, given their 
absence numbers in a previous period. 

In the preceding section we have treated the 1947 absences as the ‘predictor’ and the 
1948 data as the ‘criterion’. However, our ‘validation’ was done a posteriori. In fact, the 
estimates for m and p were based on the combined period 1947-8. We shall now demonstrate 
that we may forecast without an initial follow-up by assuming no knowledge of the events 
in 1948 whatsoever. After having made our predictions we shall compare them with the 
actual happenings of that year. This then will be a true @ priori validation. 

The standard unit of observational time was defined as one 4-weekly period, so that one 
calendar year was made up of thirteen 4-weekly periods which included an annual holiday 
of 3 weeks. For correlational purposes we split the first twelve 4-weekly periods during 
1947 into equal halves. The calculated product-moment correlation was r = 0-55 for 
n = 318. From this we infer absence-proneness in the group. The data conform with the 
linear regression hypothesis of the bivariate negative binomial. Next we estimated the 
parameters of the univariate negative binomial for the total year 1947 which, taking 
cognisance of the 3-weekly annual leave, corresponded to an actual exposure period of 
12-25 4-weekly periods. Making use of the tables for the maximum-likelihood solution and 


the formulae for the standard errors (Sichel, 1951) we arrived at the following estimates of 


h : r~ 
the parameters m = 4-9340+0-2452 (n = 318), 


D = 1-7164+40-1925 (nm = 318). 


A comparison was made between the observed and expected frequencies for 1947 using 


the above estimates in equation (7). Agreement was satisfactory and, on the strength of 


the x?- 
e x*-test result, P (11-419; 11) = 042, 


we cannot reject the hypothesis of our model. 
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If we assume that the subjects spread their annual leave evenly over the total year, the 
average exposure for a 24-weekly period is only 22-615, or measured in 4-weekly units 
5-654. Our estimate of m for an actual exposure of 12-25-units was 4-9340, hence we estimate 
the mean number of absences per 4-weekly exposure to be 

4-9340/12-25 = 0-40278, 
in view of equation (8). For 5-654 4-weekly periods we find 
i’ = 0-40278 x 5-654 = 2-277, 
which compares satisfactorily with the observed means of 2-267 and 2-333. Our estimate 
of the product-moment correlation based on the parameters of the univariate negative 
binomial becomes (from equation (17)) 
p = 2-277/(1-716 + 2-277) = 0-57, 
which compares favourably with r = 0-55 by ordinary product-moment correlation. 
Distributions were also observed for the following exposure periods in 1947: 
3 4-weekly periods, 
6 4-weekly periods, 
8-25 4-weekly periods. 
From equations (8) and (9) we computed the expected means and standard deviations by 
substituting fi = 0-40278, p = 1-7164, 
being the estimates for a 4-weekly exposure which in turn were obtained from the year 


distribution. The observed and expected statistics are given in Table 2. Again the agreement 
between theory and observation is plausible. 


Table 2. Showing observed and expected statistics for various lengths of exposure 

















A 
Number of P x 8 
4-weekly | | Pus 
periods | 
k Observed pic | Observed Expected Observed Expected 
3°00 2-01 1-72 | 1-12 1-21 1-32 1-43 
6-00 1-27 1-72 2-50 2-42 2-72 2-41 
8-25 1-93 172 | 333 3-32 301 | 312 
12-25 4-37 

















1-72 ° 172 | 4-93 | 4-93 4:37 





Before proceeding to the prediction of absences in 1948 we shall derive the standard _ 


error of p= ml(p+m). 
For large sample sizes we have 


0\* p 0 p ae 
var (f) = (5) var (p) +(e) var (m m+2 (2) (5 ) cov (i). 
After substitution of the appropriate partial derivatives and the variances and covariance 
given previously by Sichel (1951), we find after replacing the statistics by their expectations: 


s.z.(f) = ap) |i i 25 <2t.2)| |). (31) 
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On the basis of the estimates m and derived from the 1947 absence distribution we can 
predict the correlation between the 1947 and 1948 absences, i.e. 


Partag = 4°9340/(1-7164 + 4-9340) = 0-742. 
From (31) we compute the standard error of f and finally we have 
Parag = 0°742 40-024 (n = 318). 


Of the 318 workers exposed in 1947 only 248 survived employment in 1948. The bivariate 
distribution for their absences in 1947-8 has already been given in Table 1 and the corre- 
sponding product-moment correlation was 


Ya74g = 0°728 «= (n = 248), 


which checks the predicted correlation within one standard error. 

We have not lost sight of the fact that the shrinkage in observed correlation may have 
been due to the more absence-prone persons leaving employment before a lengthy period 
of exposure has elapsed. Without giving details we have sufficient evidence to suggest that 
this is not so. 

Substitution of m and 9 based on the yearly period of 1947 into equation (30), and making 
£ = 6 (for the sake of example) gives the operating characteristic 


1 5 T(y+2+1-7164) 


aan 2+1-7164 
prob. (y > 6|x) = 1—(0-5741) P(z+1-7164),~ T(y+l) 





(0-4259)". 


The predicted O-C curve and the proportion of workers who actually had 6 or more 
absences in 1948 for each group of individuals with x absences in 1947 have been charted 
in Fig. 1. Although all observations fall well within the probability belt of individual pro- 
portions, we find the majority of points to plot slightly below the theoretical curve. This 
again is but another manifestation of shrinkage in the mean for 1948. 

The O-C curve was based on the absences of 318 subjects in 1947. During the validation 
year (1948) 70 individuals left their employment so that only 248 could be followed up. To 
test whether ‘restriction’ had taken place we calculated the means and standard deviations 
for the 1947 absences of the two groups of n, = 318 and n, = 248. We found 


1947 absences 
m,=318 n= 248 
%,=493 Z, = 4-70 
8, = 437 8, = 4:32. 


The above figures suggest that the slight discrepancies between observations and theory 
are not due to ‘restriction of range’. 

In the practical situation we may be interested to single out for curative measures all 
those who are likely to have 6 or more absences in 1948. By necessity the ‘selection’ has to 
be undertaken at the end of 1947. There will always be some few individuals who have say 
0, 1, 2 absences in the predictor year and 6 or more in the following period. However, the 
probability of spotting these is small (see O-C curve, Fig. 1) and we shall do more harm by 
singling out many for treatment who do not need it than finding the few who will have a bad 
record in the future period. Clearly what we need is an assurance that the error and extra 
effort of attending to people who do not need treatment are small in comparison to the gains 
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originating from a reduction of absenteeism among persons whom we can justifiably call 
absence-offenders. It appears logical to set the predictor cutting-score « at the same level 
as £, because if 6 or more absences in 1948 are considered worse than tolerable then the same 
argument oughtto apply to 1947 aswell. From the O-C curve we find for « = 6 the probability 
of treating the correct person to be at least 0-475. 


1-000, 


ea 


Probability (y >6]x) in 1948 


NS 
8 








0 Sra ee ae ee en ee a oe on ae es om ee 
W524 Dd DD BT 20-19-18 TT 6 15 14 19 1 109 8 7 


ee oe. Jet Bae 
(1) (2) (2) (2) (2) (4) (6) (S) (3) (7) (8) (10) (8) (16)(20)(26)(32)(36)¢34)(24) © 
No. of absences in 1947 (x) 





Fig. 1. Operating characteristic with probability limits for observed proportions based 
on univariate negative binomial distribution for 1947. 
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Fig. 2. Cumulative frequency distribution of observed and expected absences in 1947. 


In Fig. 2 we have drawn the theoretical and observed cumulative frequency distribution 
of absences in 1947. From it we may infer what percentage of the working population must 
be singled out for closer attention with respect to absenteeism for any given level of a. 

For a = 6 we find that on the average 35 % merit attention. In our particular case we 
should have selected 103 (32-4 %) from the total group of 318 individuals. 

It is unavoidable that some of the potential absentees will terminate employment before 
the follow-up period has elapsed. Of 103 subjects having had 6 or more absences in 1947, 
76 were still employed at the end of 1948. In Table 3 predictions and follow-up results are 
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compared. Of the 76 absence-offenders in 1947 who were selected for ‘treatment’ 55 (= 72%) 
were expected to be absence-offenders in 1948 as well. We actually observed 52 (= 68 %) 
with 6 or more absences in 1948. 


Table 3. Showing comparison between expected and observed numbers 
of absence-offenders in 1948 























| | ici Number of 
Number of | Peas of Number of Number ae persons who 
absences in nae = | persons exposed xf ita 3 " actually had 6 or 
— ae : 9 pas in 1947-8 “ oe 19468 more absences 
| | in | E(z) absences in in 1948 
p(x) E(x) p(x) n(x) 
6 0-475 16 8 7 
7 0-568 8 4 o 
8 0-652 10 6 5 
9 0-726 8 6 7 
10 0-786 7 6 6 
ll 0-840 3 2 2 
12 0-881 5 4 4 
13 0-911 § 6 5 
} 14 0-931 4 4 3 
| 15 0-948 2 2 2 
16 0-961 2 2 2 
17 0-971 2 2 2 
18 0-979 ~~ oes mae 
19 | 0-985 — — ane: 
| 20 0-990 —- -- — 
21 0-994 2 2 2 
| 22 | 0-997 1 1 1 
eae cs = - eee Ee: 
Totals | a | 76 55 52 
| | | | 











Bearing in mind that all our predictions were based on the 1947 data alone and that our 
working model does not take account of factors which may cause significant time variations 
in individual A’s, we conclude that the agreement between predicted and actual events as 
revealed by Table III is highly satisfactory. 


SUMMARY 


In this study a model for absence-proneness has been developed, and we have illustrated 
the practical use to which the concept of proneness can be put. This application to the needs 
of industry can be summarized as follows: 

(a) It is necessary to accumulate one year’s data only before preventive action can be 
taken in respect of absenteeism. 

(6) These data can be split into two halves for the purpose of establishing the phenomenon 
of proneness by correlation techniques. 

(c) In terms of the estimates of the parameters of the distribution observed, a mathe- 
matical estimate can be made as to the magnitude of this correlation coefficient if that 
year’s data were compared with the next, which are not yet to hand, and which it is desired 
to predict. 
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(d) It has been established in our case that, though this mathematical model, strictly 
speaking, may not be the correct one, it will be sufficiently accurate for practical purposes, 
provided the two consecutive exposure periods do not exceed one year. Furthermore, in 
any given year the accuracy of theoretical estimates can be checked as has already been 
done in this study. The investigator is not compelled therefore merely to place implicit faith 
in the results of the present research. The efficiency with which we can predict the absences 
of individuals in the second year before these have actually occurred, can then be established 
in terms of operating characteristics. 

(e) These operating characteristics will give management information which will enable 
it to decide on the appropriate remedial measures to be taken in the given circumstances, 
with a clear indication of the consequences and effectiveness of its actions. 

(f) When, in actual practice, remedial measures are applied at the end of the first year and 
before the second year’s data have become available, it will not be possible to test the 
validity of the working model developed here. This does not mean that the theory has broken 
down entirely. In fact, the efficacy of the measures taken can be judged by the extent to 
which observation deviates, from theory, over and above the adjustment process suggested 
above. Where none occurs, this would indicate that the measures had not been effective. 
Deviations should naturally be for the better. One would expect the observed points to 
shift uniformly to one side of the curve. It might well be possible in the future to establish 
along these lines a means of assessing the efficiency of the remedial measures taken. 
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Table 1. Observed and expected frequencies during 1947-8 
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ON THE SUPERPOSITION OF RENEWAL PROCESSES 


By D. R. COX anp WALTER L. SMITH 
Statistical Laboratory, University of Cambridge 


Suppose that there are a number of independent sources at each of which events occur from time to 
time. The intervals between successive events at any one source are assumed to be independent 
random variables all with the same distribution, so that each source constitutes a renewal process 
of a familiar type. The outputs of the sources are combined into one pooled output. Statistical 
properties of the pooled output are investigated and the results applied to a problem in neuro- 
physiology. 


1. IyTRoDUCTION 


This work was suggested by problems in neuro-physiology, although there are very probably 
other applications. Suppose that a number of neurons independently send discrete pulses 
to a common central nerve cell. We shall investigate the relation between the statistical 
properties of the sequence of impulses from an individual neuron and the corresponding 
properties of the combined sequence of pulses at the central cell. A very similar problem 
arose in the recent study by Fatt & Katz (1952) of spontaneous. subthreshold activity at 
motor nerve endings. They found that at the tips of certain many-branching nerve endings, 
there are a large number of ‘active spots’ each giving rise to localized electricai pulses in 
the muscle fibre with which they make common contact. Here the statisticai problem is to 
infer as much as possible about the individual ‘active spots’ from observations on the 
sequence of pulses in the muscle fibre. 
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Fig. 1. The pooling of outputs. 


We can state the problem formally as follows. Suppose that there are a number of 
sources, at each of which events occur from time to time, and that the outputs of the sources 
are combined into one pooled output. Fig. 1 illustrates this for three sources. The events 
are marked by short vertical lines and the outputs of the sources S,, 8, and 8, are combined 
into one pooled output P. In general, the output of the ith source is defined by an increasing 
sequence [¢], where ¢“ is the time at which the jth event occurs at the ith source. The pooled 
output is defined by rearranging the ¢{° into a single increasing sequence. Given the statis- 
tical properties of the separate sequences [¢{"], we wish to find the statistical properties of 
the pooled output. 
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In an earlier paper (Cox & Smith, 1953a) we examined what happens when the individual 
sequences S; are strictly periodic, i.e. when S; is [6,, 20;,30;,...]. In the present paper we 
shall suppose that the intervals between successive events on any one source are independent 
random variables, all with the same distribution, and that random variables associated 
with different sources are independent. Thus the output of one source forms a renewal 
process of the type that has been extensively studied (Feller, 1941). 
properties of the pooled output and in §5 we investigate the limiting behaviour as the 
number of sources becomes large. A numerical application to experimental data is described 
in §6. 


2. Tx “LAY DISTRIBUTION FOR A SINGLE SOURCE 


In this section we consider .. single source. Let the intervals between successive events be 
a sequence [X,,] of independent identically distributed random variables all with an 
absolutely continuous distribution function F(x), and a frequency function f(x). The nth 
event occurs at time Z, = X,+...+ X,, and if f,,(z) is the frequency function of Z,,, 


M(x) = & fale) (1) 


is the total density of events at x, irrespectively of serial number. 

Fix a time ¢ and define the delay Y(t) to be the time measured from ¢ back to the im- 
mediately preceding event. The frequency function g(y; ¢) of Y(t) will be called the delay 
function at ¢. Since the total density of events at x is h(x), and the chance that the interval 
between events exceeds y is 

, F{y) = 1- Fy), (2) 


we have gy; t) = h(t—y) Fy). (3) 


We shall be concerned with the behaviour a long time from the beginning of the process 


so that we consider . 
gy) = eo" g(y; t). (4) 


It is well known that under weak conditions (Feller, 1941; Tacklind, 1945; Cox & Smith, 
19630; Smith, 1958) lim h(x) =1/u, where » = E(X,), (5) 


I> 


thus proving that the limit (4) exists and that 


gly) = Fly)/z. (6) 


We call g(y) the equilibrium delay function corresponding to the parent distribution f(z). 

Conversely, given g(y), f(z) is determined by 

3 1 dg(x) 
f(z) = — 5) ah (7) 

The derivation of (6), and hence also of (7), depends only on the assumptions that 

(i) the distribution function of the intervals between successive events is F(x); 

(ii) the density of events at ¢ tends to 1/y as ¢ tends to infinity. 

No use is made of the independence of successive intervals other than in the proof of (ii). 
We shall later apply (7) when conditions (i) and (ii) can be verified although successive 
intervals are not independent. 
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It is intuitively obvious, and can be proved, that the equilibrium frequency function of 
the time interval measured forward from a fixed point to the next succeeding event is the 
same as that of the equilibrium delay.* 

It is easy to express the moments of the delay in terms of the moments of the parent 
distribution and, in particular, to show that the mean delay is (0? + ~?)/2u, where uw and o 
are the mean and standard deviation of the parent distribution. 


3. THE NUMBER OF EVENTS IN INTERVALS OF GIVEN LENGTH 


This section is concerned with the probability distribution and variance of the number of 
events falling in intervals of given length t. We shall deal mainly with the equilibrium 
behaviour for intervals situated a long way from the origin, i.e. with the limiting behaviour 
for intervals (wu, u +t) as u tends to infinity. The principal result is that if V(t) is the variance 
of the number of events in an interval f, 


2 
Vin as to, (8) 


where C is the coefficient of variation of the parent distribution. Further terms in the 
expansion of V(é) will be obtained. Feller (1949) proved (8) in his well-known paper on 
recurrent events; the main differences between his and our work are that Feller considered 
discrete, not continuous, time, and that our problem is easier since it is concerned with 
equilibrium behaviour. 

A heuristic proof of (8) follows easily from two equations in sequential analysis (Wolfo- 
witz, 1947). If Z, = X,+...+X,, is the cumulative sum when sampling stops after n 


‘events’, we have E(Z,) = EH(n)u and E(Z,—nyu)* = E(n) o*. (9) 
In our application Z,, is the time up to the nth renewal. If sampling is stopped after a long 
time t, Z,,~t, where n is the number of events occurring in ¢t and hence 

E(n)~t/w and EH(t—ny)?~o*t/n, 
i.e. var (n) ~ C?t/y. (10) 


n 


A similar argument working from Wald’s fundamental identity of sequential analysis is 
given by Bartlett (1949). 

A different approach is needed to obtain more precise results. Let pi(t) be the probability 
that k events occur in the interval (wu, w+), so that in particular p2(é) is the probability of 
k events in (0,t). Then t 
Put) = [ ow: u) pi(t—y) dy. (11) 


By (3) and (5) g(y; w) is, for sufficiently large uw, bounded in 0<y<t; also 0< pe_.(t) <1. 
Therefore the theorem of bounded convergence gives that 
“t t 
lim | 9 u) Pr_a(t—y) dy = [ oo) 2t-at- dy, 


thus proving the existence of a limit for p}(t) as woo. We write 


p(t) = lim pj(t), (12) 
u-> © 
and call p{(t) the equilibriwm distribution of events. 


* The delay is the same as the residual life-time discussed by Doob (1948), and the delay and the 
forward interval are the same as two quantities defined by Smoluchowski (Bartlett, 1953). 
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By (6) and (12), 
= Ff (ota 1(7) — F(t—9) ph_s(9)} dy. 


But wie A {, ay [. "fl-8)ph-al6) dé 


t 
“ [ Fe-nrt-snde, 


so that att) = 7 | (08-0) — 2m}. (13) 
If we define moment-generating functions by 

M%0,t) = ¥ epE, MHO,t) = E PEO (14) 
(13) leads to M*6,t) =1- ee) M%6,) dy. (15) 


Let m*(t), m¥(t), ... be the moments about zero of the distribution pf(t), where u = 0,e. 
From (15) we deduce that when the appropriate quantities exist 


mi(t) = t/p, (16) 
t 
mé(t) = t/n+ ; | “m3n) an. (17) 


If V(t) is the equilibrium variance, (16) and (17) give 


t 
=< 7h b(n) dn, (18) 
0 


where y(t) = 1+ 2mY(t) — 2t/p. (19) 
We denote the Laplace transform of a function by the corresponding bold symbol; 
for example, a 
8) = { e~* u(t) dt. (20) 
0 
Then it may be shown from (19) that 
1 ~~ 0. z 
$s) =~ +7 aay 


Let 44, 4g,... be the moments about zero of f(z) and let C be the coefficient of variation, 
C? = (u,—*)/u*. Then expansion of (21) gives, as s tends to zero, : 


(21) 


i ac Mi ow Fh (22) 


C? | 3ug— 2p, 


This suggests that as t-> oo 


ee 


a rigorous proof of (23) may be derived, under general conditions, from results in renewal 


theory; see, for example, Owen (1949) and, more generally, Smith (1953). A similar formula 
has been indicated by Feller (1948). 


i 
‘ 


(23) 


Lula 


—-_ 
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Equation (23) shows the behaviour of V*(t) for large t. We also require the behaviour for 
small ¢, which may be obtained as follows. By (18) 


vet) = £+2f {mgin) —") dy, (24) 

and it is easy to show from this that if as ¢ tends to zero 
[ feerde = ow) (2>0), (25) 
then Ve(t) = } +0(t). (26) 


Thus near ¢ = 0 the variance-time curve behaves like that of a completely random series. 

If f(x) is given explicitly it may be possible to obtain more precise results than (23) and 
(26). For example, if f(x) is of the x? type with 2v degrees of freedom, C? = 1/v and (23) 
becomes 


Ve(t) ~~ +5 (1-3) as t->0o. (27) 


V(t) may, of course, be found exactly, although the calculation is tedious for large v. As 
an example, if f(z) is of the x? type with 6 degrees of freedom and mean 3, 


1 
“= Ta 
and (20) and (21) lead to Ve(t) = - £ ( —e-t# cos +) ‘ (28) 
9 27 2 
nigihtr t 4 
We may compare this with Ve(t)~ ot a7 (29) 


obtained from (27). 


4. THE POOLING OF OUTPUTS 
Suppose that the outputs of NV sources are pooled in the way described in §1. We shall 
assume that the outputs of the individual sources are of the type considered in §§ 2, 3, that 
different sources are independent, and finally, for convenience, that all the sources have the 
same parent distribution F(x). We shall consider throughout the equilibrium behaviour 
a long time after the start of the process. 

First we calculate the frequency distribution of the interval between successive events 
in the pooled output. For each source we define delay random variables, Y,, where Y; is the 
time from a fixed epoch back to the immediately preceding event or. the ith source. If Y 
is the corresponding random variable for the pooled output 


Y = min (Wj, ..., Yy), 


so that since the N sources are independent, 


prob(Y>y) = If g(e) de] -| {= F.(@) ae]. (30) 


Therefore the delay function for the pooled output is, on differentiating, 


NEL (° Ble ag) 
# vy # 
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By (7) it follows that the frequency function of the interval between successive events on 


the pooled output is 
: : fv) = -5-{rn| [ae]. (31) 
For example, if f(y) = e-¥, (31) gives 
fly) = Ne, 
expressing the obvious fact that if the separate outputs are completely random so is the 
pooled output. As another example suppose that the parent distributions are rectangular 
over (0,1). Then (31) gives fly) = (2N—1)(1—y)?*-2, 


As N->©o, fply) ~ 2N e-2Xu, (32) 
In §5 it will be shown that the limiting form is exponential in general. 

The equilibrium distribution for the number of events falling in an interval of length ¢ 
may be found by convoluting the distributions for the individual sources, so that in par- 
ticular the moment-generating function is 





[M°(0,t))’, (33) 
and the equilibrium variance of the number of events in an interval is 
V(t) = NV*(t). (34) 
7 (12 
Thus v(t) ~ Ae ta CtA as too, (35) 


where A-! = /N is the mean interval between events on the pooled output. Also 
V,(t)~tA as t+0, (36) 
and the intercept of the asymptote is, if f(x) is of the y? form, 


N 

aa mi 
5 (1-04). 
Thus if we are given the V,(¢) curve for a pooled output of the type we are considering, 


C may be estimated by 
C2 = slope of asymptote x mean interval between events. (37) 


To estimate N some further assumption is required about the parent distribution f(z). 
For example, if an estimate is calculated from the intercept of the asymptote, the third 
moment of f(z) must be known. The simplest procedure is to assume that f(x) is of the x? 


type, when «  6x/intercept of asymptote 


N > 

vo C4 

is a suitable estimate. If C is appreciably less than one the estimate is insensitive to changes 
in the form of f(z). 


(38) 


5. LARGE NUMBER OF SOURCES 


We now consider briefly the form of the pooled output when the number of sources is large. 
We assume the weak condition (25) on the parent distribution, namely, that there exists 
&(0<£ <1) such that for small ¢ ‘ 
[fo dz = O(t). (39) 


ana ti. 0246 4 cise wet ae” | Oe ee 


a Th — oO a. ele 


on 


31) 


ring, 


(37) 
f(x). 
shird 
1e x” 


(38) 


inges 


arge. 
‘xists 


(39) 





D. R. Cox anp Watrer L. Surru 97 

It follows that M60, i) = 1+ O(#), (40) 

(=e) 
7] 


Express ¢ as a multiple of the mean interval on the pooled output by putting 


so that by (15) Me(6,t) = 1- {t+ O(H+A)}. (41) 


) 


rp 4 
t= N° 
Then we have for the moment-generating function of the pooled output, by (33), 
N | pl+B \]\N 
(oR =(-a-afpsoa)eeem 


Therefore the number of events falling in an interval of length wr/N tends to be distributed 
in a Poisson distribution as N tends to infinity. In particular, the interval between successive 
events tends to be distributed in an exponential distribution, as can also be proved directly 
from (31). It does not follow from (42) that pairs of intervals tend to be independent 
although this is in fact true; we omit the proof. 

Thus, for large N the pooled output is ‘locally random’ in the sense that it has the pro- 
perties of a completely random series, provided that we consider the behaviour over 
intervals ¢ small compared with the individual mean recurrence times 4. This result may be 
expected to hold for a large number of independent sources, irrespective of the precise 
nature of the output of the single sources. A further particular case has been considered by 
Cox & Smith (1953a). The local approach to complete randomness is of general interest, 
because there are numerous practical applications of stochastic processes in which complete 
randomness is customarily assumed in situations where a pooled output is being considered. 
For instance, the sequence of calls at a telephone exchange is the pooled output of the calls 
originating from individual subscribers. Hence the overall sequence may be expected to 
be locally random. 


6. APPLICATION 


We now consider the practical application of our results. Fatt & Katz, in the work referred 
to above, obtained a sequence of over 800 pulses, being the ‘pooled output’ of a number of 
active spots. They showed that the distribution of intervals between successive pulses was 
fitted satisfactorily by the exponential curve, to be expected for a completely random 
sequence, although they were careful to point out that the exponential curve was consistent 
with strictly periodic individual outputs. 

We consider three possible structures for the sequence: 

(i) the individual sources, and hence also the pooled output, are completely random; 
(ii) the sequence is the pooled output of N strictly periodic sources; 

(iii) the sequence is the pooled output of N independent sources of the type considered 
in the present paper. 

We wish to decide which possibility holds, in case (ii) to estimate the number, N, of 
active spots and in case (iii) to estimate both N and the coefficient of variation, C, of intervals 
on one source. 

To do this we use the variance-time curve, V(t). If the sequence is completely random 


Vit) =1, (43) 
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where A-1 is the mean interval between successive events in the pooled output. In case (ii), 
V(t) has been shown in a previous paper to oscillate for large ¢t about an average of 4N. In 
case (iii) we have shown (35) that 














V(t)~CtA, for large t, (44) 
Table 1 

| t(sec.) | P(t) | sx.of Wit) | ie 

| | | 

| | 

| | | 

2 11-2 | 1-41 11-0 

4 23-1 3-45 | 22-0 

| 8 39-9 | 9-39 44:1 

16 54:1 26-8 88-1 

32 83-4 79-4 176-2 

48 189-7 154 264-3 
x 


150-— 


(t) 











t (sec.) 


Fig. 2. Variance-time curves. x x x x experimental values. theoretical for completely random 
erlees< 3.8: theoretical for 100 strictly periodic sources. --—~—~— theoretical for 100 stochastic 
sources, C = 1/,/3. 





and that C? and N can be estimated from the slope and intercept of the asymptote. In all 
three cases V(t)~tA for small t. (45) 


The estimation of V(t) from experimental data has been described in the earlier paper 
(Cox & Smith, 1953a). The experimental series is divided into sections of length 7 and the 








lom 
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number of events in each section counted. (A suitable choice for 7 is approximately one-half 
the smallest value of ¢ for which V(t) is required.) From the discrete time series so formed, 


an estimate V(r) of V(rr) may be calculated by a process of successive addition and 
squaring, that will not be described in detail here. When the sequence is completely random, 


V(rr) is an unbiased estimate of V(r7) and its standard error may be calculated. 
For Fatt & Katz’s data we find the values given in Table 1 and illustrated:in Fig. 2. 


The mean interval between events is A-! = 0-220sec. The period of observation is 174sec., 
so that, for example, ¢ = 32sec. is an appreciable proportion of the series available for 


analysis. This is why the standard error of V(32) is very large. In no case does V(t) differ 


from its value At for a completely random series by much more than its standard error, so 
that there is no evidence that the series is not completely random. 

If the number of sources were large, or the coefficient of variation were near 1, our method 
would be insensitive. As examples we have drawn on Fig. 2 theoretical curves for 

(a) 100 strictly periodic sources (approximate curve); 

(b) 100 stochastic sources with C = 1/,/3, the intervals on any source being proportional 
to x? variables with six degrees of freedom. 

If the method were to be used extensively it would be worth-while investigating the 


distribution of V(t) for the alternative sequences of types (ii) and (iii), hence developing 
methods of determining confidence intervals for N and C. 


We are extremely grateful to Prof. B. Katz, F.R.S., and Dr P. Fatt, University College, 
London, for supplying us with the experimental results analysed in §6, and for comments 
on the paper. 
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CONTINUOUS INSPECTION SCHEMES 


By E. 8. PAGE 
Statistical Laboratory, University of Cambridge 


1. INTRODUCTION 
1-1. Preliminary remarks 


Whenever observations are taken in order it can happen that the whole set of observations 
can be divided into subsets, each of which can be regarded as a random sample from a 
common distribution, each subset corresponding to a different parameter value of the 
distribution. The problems to be considered in this paper are concerned with the identifica- 
tion of the subsamples and the detection of the changes in the parameter value. Such pro- 
blems can arise in a number of fields of application. For example, in an experiment in extra- 
sensory perception the proportion of correct answers given by a subject in response to 
a series of questions may change during the course of the experiment, and it may be desired 
to estimate the position of the change or to stop the experiment when a change is noticed. 
Again, in a psychological experiment in which a subject is required to guess the colour of 
the next ball to be drawn at random with replacement from a bag containing balls of two 
colours the subject’s proportion of correct guesses in a series of trials with the same bag 
of balls may change as he gains some knowledge of the constitution of the bag from the 
results of earlier guesses; it may be of interest to estimate the point at which the change took 
place. More widely known are the occurrences in industry of problems of detecting changes 
in the quality of the output from a continuous production process. Some such processes 
maintain an approximately constant quality of output for considerable periods; occasion- 
ally, probably because of a fault at some point of the process, the quality worsens and a 
large proportion of the output becomes unacceptable. The quality of the output may be 
assessed by some measurable characteristic (e.g. when the length of articles is normally 
distributed with constant variance the mean length may be used as an indication of quality), 
or by the fraction of the output that fails to meet given specifications. In general, it will be 
possible to assign a quality number, 0, to the output which may be taken as a parameter 
of the distribution. We are interested in the changes in 0. 

One of the simplest criteria for detecting a change in the mean, 0, of the distribution is 
a weighted sum of the last few, k say, observations, i.e. a moving average. If k is small, 
large changes in @ will be detected rapidly but small changes only slowly; on the other hand, 
a larger value of k will be required for the best detection of small changes in @ but then large 
changes will be noticed later owing to the moving average damping the effect of a single 
extreme observation. In general, the consequences of rules based on moving averages are 
difficult to evaluate. The theory of one such rule for identifying the subsamples in observa- 
tions from a binomial population of changing mean has been given by Anscombe, Godwin 
& Plackett (1947). 

It will be convenient in what follows to use the terminology of the industrial applications 
in view of its greater familiarity. 








ns 
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1-2. Detection of a change in the parameter 

The first problem to be considered here is that of detecting a change in the parameter 0. 
It is this problem that process inspection schemes are designed to solve; it is required to 
detect a deterioration in the quality of the output from a continuous production process. 
When such a deterioration is suspected some action is taken; for example, the production 
may be suspended and a machine reset. A widely used scheme consists of examining samples 
of a fixed size at regular intervals of time; a statistic of the sample (e.g. mean, range, or 
number of defectives) is plotted on a control chart and corrective action is taken if the point 
falls outside control limits drawn on the chart (e.g. Dudding & Jennett, 1942; Duncan, 1952; 
Shewhart, 1931). In most cases of practical interest there is a probability of one that some 
point will eventually fall outside the limits and action will then be taken even when there is 
no change in the quality number, @. If the fraction of the output that is sampled remains 
constant the amount of output produced before action is taken is proportional to the 
number of articles inspected. We are led, therefore, to consider what we shall call the 
average run length function for a process inspection scheme. 


Deriniti0n. When the quality remains constant the average run length (A.B.L.) of a process 
inspection scheme is the expected number of articles sampled before action is taken. 

The A.8.L. is clearly a function of 0, the quality number. 

When the quality of the output is satisfactory the a.R.L. is a measure of the expense 
incurred by the scheme when it gives false alarms, i.e. Type I errors (Neyman & Pearson, 
1936). On the other hand, for constant poor quality the a.R.L. measures the delay and 
thus the amount of scrap produced before the rectifying action is taken, i.e. Type II errors. 
This measure is one of several suggested by Aroian & Levene (1950); another is the pro- 
bability that action is taken within n observations. 

For the control chart scheme described, on constant quality, the number, r, of samples 
each of N articles examined befcre action is taken is a geometric variable such that 


Pr (r =k) = Pk-1(1-—P), where P= P(6), 


the probability that a given sample point falls between the control limits. &(r) = (1—P)-! 
and the a.R.L. function is L(0) = N/(1—P). (1) 


Although the results of previous samples are recorded on the chart none is used by the 
above process inspection rule; in order to avoid some of this loss, warning lines within the 
control limits are often drawn on the chart and the rule amended to read: ‘Take action if 
any point falls outside the control limits or if / out of a sequence of m points fall outside the 
warning lines.’ The a.R.L. for rules of this type can be calculated by enumerating the 
possible combinations of the positions of the last m — 1 points and treating them as the states 
of a discrete Markov process; the methods described, for example, in Bartlett’s paper (1953), 
will then give the expected number of samples before a combination demanding action is 
observed.+ Such schemes will not be considered further here. 


+ When a chart is used for controlling the mean of a normal population, statements are sometimes 
made in the literature such as: ‘4 out of 16 points falling outside warning limits at w+ 1-960/,/N are 
equivalent to one point outside control limits at u + 3-090/,/N.’ If the process mean is 1, these two events 
have the same probability, but rules using one or both of these criteria for action will have different 
A.R.L. functions and therefore a different effect in practice. 
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The process inspection rules described so far are based either on a single point recorded 
on the control chart or on a fixed number of the most recently recorded points. Such rules 
will therefore fail to make use of all the information, that is available on the chart. Again, 
with these rules the best sample size and the position of the control limits vary with the 
magnitude of the change in the quality number that it is important to detect (Page, 1954). 
Consequently a rule that is optimum for a certain magnitude of change will not be optimum 
for other magnitudes of change, and in some cases the loss in efficiency may be serious. For 
example, if the best sample size for a single sample scheme to detect a change from @, to 0, 
is NV, the a.B.L. of this scheme is not less than N for any value of 6; it may be possible by 
some other method to detect a large change in fewer than WN observations. In the following 
sections we develop rules that use all the observations since action was last taken and that 
are suitable for the detection of any magnitude of change in the parameter. 


2. ONE-SIDED PROCESS INSPECTION SCHEMES 

2-1. A transition scheme 
Process inspection schemes that are designed to detect deviations in @ in only one direction 
will be called one-sided schemes. We consider first a simple example of a rule to control the 
fraction of defective articles produced by an industrial process. 

Suppose samples of twenty articles are-taken every hour; let d,, be the number of defec- 

tives in the nth sample after action was last taken. Consider the rule: 

Take action if d, >3, 

or d,+d,_,> 4, 

or d,+d,_,+d, 225, , (2) 


etc. 
i.e. > d,_,423+r for at least oner (O<r<n-1). 
i=0 

After each sample is taken, the total number of defectives in the last one, two, three, etc., 
samples is examined and action is taken if, over any sequence of samples, the average 
number of defectives per sample is ‘much’ more than one. If 5% defective is the critical 
quality level for the process, so that any worse quality requires action to be taken, the 
above rule would be suitable. 

With this rule the decision whether or not to take action is made after each sample and 
all the previous samples are used in making the decision. This rule may therefore be regarded 
as transitional between those rules for which the decisions are made after each sample on 
the results of a fixed number of samples and those for which a decision is made after each 
observation considering all previous observations (the latter type of rule will be called a 
sequential process inspection scheme). 

The action criteria (2) may be specified in a different way. Let a score +19 be assigned 





n 
to each defective occurring in a sample, and —1 for each non-defective. Let S, = > x, 
i=1 


where z,, is the total score in the nth sample after action was last taken. Then (2) isequivalent 


to: Take action if S,—S,_,>40foranyr (l<r<n), (3) 


n—r = 
or, what is again equivalent, 


Take action if 8, — min S;> 40. (4) 
O0<i<n 





tion 
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When the cumulative score S, is plotted on a chart the mean path when the fraction 
defective, 0, is constant and less than 5 % is below the horizontal. If @ > 0-05, the mean path 
is above the horizontal so that the action criterion (4) can be expected to be satisfied 
speedily. Accordingly this rule would be appropriate for controlling a process for which 
more than 5 % defectives could not be tolerated. 

In the above scheme when the quality is satisfactory the mean path on the cumulative 
sum diagram is downwards, and a deterioration in quality is detected by an upward change 
of direction in the mean path. A similar criterion for controlling the parameter of other 
distributions when deviations in only one direction are important is provided by the 
following rule. 


Rule 1. Take samples of fixed size N at regular intervals; assign a score x, to the kth 
n 
sample and plot the cumulative score S, = > 2; on a chart: 
k=1 
Take action if S, — min S;>h. (5) 
O0<i<n 


The a.R.L. for this rule will be obtained as a special case of that for the rule of the next 
section. 


2-2. A sequential scheme 

In what follows we shall suppose that observations are recorded singly at regular intervals, 
and that a decision whether or not to take action because of a suspected deterioration in the 
output is made after each observation. A rule fulfilling these requirements is obtained by 
modifying Rule 1. ° 

Rule 2. Take observations at regular intervals; assign a score x, to the kth observation 
and plot the cumulative score S,, = = x, on w chart. Take action if (5) is satisfied. 

=1 

The system of scoring is chosen so that the mean ::ample path on the chart when quality 
is satisfactory is downwards, i.e. of negative gradient, and is upwards when quality is 
unsatisfactory (Fig. 1). 

In order to evaluate the A.R.L., define 


Sj, = max(S)_,+2,,0) (n> | 6) 
S, = 0, 
so that S/) = 0 whenever S, < min S,. The condition (5) is then equivalent to: 
0<i<n 
Take action after the nth observation if S/ >h. (7) 


It can now be seen that this rule breaks up into a sequence of Wald sequential tests with 
boundaries at (0,4) and initial score zero.t The test is reapplied when the previous test 
ends on the lower boundary and action is taken when a test ends on the upper boundary. 

Notation. Let the probability that a Wald test with boundaries (0, ) and initial score Z 
end on the lower boundary be P(Z) and the average sampling numbers, unconditional, 
conditional upon the test ending on the lower boundary, and conditional upon the test 
ending on the upper boundary, be N(Z), N,(Z) and N,(Z) respectively. A test that ends upon 
the lower boundary will be called an ‘acceptance test’; conversely for a ‘rejection test’. 


+t The procedure in which observations x,,Z,... are taken so long as a< Z,<b, where Z; = Z;_,+ 24, 
Z_ = Xp will be termed a Wald sequential test with boundaries (a, b) and initial score 2». 
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The probability that there are r acceptance tests before a rejection test occurs is 
{P(0)}" {1 —P(0)}, and the expected number of such tests is therefore 











Er{P(O)} {1 P(0)} = P(O)/{1 — P(0)}. (8) 
It follows that the a.R.L. for this rule is 
r P(0) 
N(0) 
S 9 
1— P(0) (9) 
Sn i 
Number of observations 
p 
a XX 7 
2 . 
Bes x < 
\ “i 
x \ “6 | 
\ x x." xX 
: x” NX, Me h 
i oer | 
«\ 
Fig. 1. Control diagram. x x sarnple points. —— mean path on satisfactory quality. 


... mean path on unsatisfactory quality. 


An approximation to the distribution of the run length, 1, may be found when P(0) = 1, 
i.e. when the number of acceptance tests before action is taken is large. If /, is the total 
number of observations in the acceptance tests its characteristic function (C.F.) is 

1—P(0) 
& tll = snseiaeama a 10 
9 = TPO) G10) a 
where ¢,(t¢) is the c.¥. of the number of observations in a single acceptance test. By con- 
sidering the dominant terms in the repeated differentiation of (10) it follows that, for P(0) 
near 1, the moments of /,/N,(0), and hence those of 1/N,(0), are approximately those of a 
geometric distribution with mean P(0)/{1— P(0)}. Hence 


Pr (<m) +1—{P(0)}*", (11) 
where v = n/N,(0). 


If two independent rules of this type are operated simultaneously (as, for example, when 
it is desired to control both mean and variance of a normal population for one-sided devia- 
tions) the run length of the combination is distributed as the minimum of two independent 
run lengths, / and /*; using the approximation (11) the a.R.L. of the combined rules will be 


A, where A) = E7124. L*-1, (12) 
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This relation can be easily seen by considering the stops due to the two rules as occurring 
at random at rates L and L*. 

In order to calculate the a.z.L. for the transition scheme, the score assigned to each 
sample may be regarded as a single observation; the average number of samples is therefore 
given in terms of the characteristics of the appropriate Wald test by equation (9). 


2-3. An integral equation for the A.R.L. 

In general the operating characteristic (0.c.) and average sample number functions of a 
Wald test are the solutions of integral equations of Fredholm’s type. For example, the 
integral equation for the average sample number N(Z) of a Wald test with boundaries 
(a, 6) and initial score Z is > 
N(d) = 1+ | Nw) fly-Z)dy. 


Using (9) it is therefore possible to calculate the a.R.L. of a general scheme as the ratio of 
the solutions of two integral equations. We can, however, derive a single integral equation 
satisfied by the a.R.L., and to do this we introduce an extension to Rule 2. 

Rule 3. Take observations and assign scores as for Rule 2. Take action if either 


(a) S,>h and S;>90 fori =1,2,...,.n—1, 


or (6) S,— min §;>h, 
0<i<n 
where So=Z (0<Z<h) and S,=S8,_,+72,. (13) 


The rule modifies Rule 2 only near its start. A formulation similar to (6) shows that a 
Wald test with boundaries (0, /) and initial score Z is first applied; if it ends in acceptance, 
Rule 1 then operates. 

Let the distribution function of a single score, x, be F(x), and let L(Z) be the a.R.L. of 
Rule 3. The a.R.L. of Rule 2 is clearly L(0). By considering the consequences of the first 
observation and taking expectations we obtain the equation, holding for 0<Z<h, 


L(Z) = 1+ L(0) F(—Z) + [ Ze) dF (x —Z). (14) 
0 


On substituting for L(0), this equation reduces to the Fredholm form, although the 

convolution kernel is lost. 
2-4. General remarks 

In the last two sections we have been studying the expected values of a random variable, 
1(S,m), the number of observations required to satisfy (5) when S is the cumulative score 
and m the previous minimum. It is plain that 1(S,m) is a Markov process in two dimensions 
where, in general, the states to which the process may move at the next observation are 
(S,,m) and (m,,m,), where S, >m and where m,<m. When part of the information is sup- 
pressed, e.g. the value of m, the resulting one-dimensional process is no longer Markovian, 
but this property is regained if the scoring system is that of (6). The states are now defined 
by the S’ score. It follows that every state S’ is a regeneration point (Kendall, 1951; Lindley, 
1951) of the process. It may also be regarded as a random walk between an absorbing and 
a ‘holding’ barrier; when S/_, +2,, <0, the particle is held at zero until the next observation 
is taken. The .R.L. is the mean absorption time, i.e. the mean first passage time to the 
states S’ >h. 
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3. TWO-SIDED SCHEMES 
3-1. Simulianeous application of two one-sided schemes 
The schemes of § 2 are adequate for the detection of changes in the quality number in one 
direction only. It is often necessary, e.g. in controlling the mean of a normal population, 
to detect changes in either direction. A procedure immediately suggesting itself from the 
previous work is to use two one-sided schemes, one to detect a decrease, and the other an 
increase, in the quality number. The rule is then: 

Rule 4. Take action after the nth article sampled if either 

S,-— min S;>h or max S;—S,>k. (15) 
O<i<n 0<i<n 

When this rule is specified by the two co-ordinates r,, = S,,— min S;, 8, = max S;—S,, the 
run length l(r, s) is a two-dimensional Markov process and the A.R.L. is the mean first passage 
time to the states (r,s) when r >h or s > k. An integral equation for the a.R.L. may be derived 
by introducing a scheme similar to Rule 3, but the equation is more complicated than (14). 
Alternatively, simultaneous integral equations for L(r,0) and L(0,s) may be written down 
in terms of the characteristics of certain Wald tests and the distributions of the overlap of 
the boundaries at termination. In view of these complications we consider an alternative 
scheme. 

3-2. A two-sided Wald test 

Barnard (1947) has suggested a sequential test for discriminating between more than two 
hypotheses, and special cases have been considered by Armitage (1947, 1950) and Sobel & 
Wald (1949). In this section we specify the particular case needed for the process inspection 
scheme of § 3-3. 

Consider the simultaneous application of two Wald tests, one with initial score on the 
acceptance boundary and the other with initial score on the rejection boundary. This 
procedure may be represented graphically as in Fig. 2. The test continues until both the 
simple Wald tests are ended. 

Let the probability that the test ends in the region H; be P(H;). Then Sobel & Wald have 
shown that P(H,) = 0.c. for the first Wald test = P’, say. Similarly P(H;) = 1—P’, so 
that P(H,) = P”—P’. In the general case the average sample number (4.8.N.) of the two- 
sided test is difficult to obtain, but when two boundaries pass through the origin it may 
be expressed simply in terms of the a.s.Nn.’s of the composing Wald tests. 

Let n, n’ and n” be the number of observations required for a decision in the two-sided 
test and the simple Wald tests respectively in a realization and let their expectations be 
denoted by the capital letter with the corresponding dashes. 


Then nm = max (n’,n”) 
= max (n’,n”)+min(n’,n”)—1, 
since the first observation causes at least one of the simple tests to terminate. 
Hence n=n'+n"—1, 
and therefore N = N’+N’"-1. (16) 


This equation may also be derived from the integral equations by considering the result 
of the first observation. The simplification leading to (16) arises because the composing 
tests are independent after the first observation. 





— 
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3:3. A scheme based on the two-sided test 
A two-sided process inspection scheme can now be simply specified in terms of the test 7’, 
say, described in § 3-2. 


Rule 5. Apply 7: if it ends in region H,, reapply 7’; otherwise take action. 
As for (9) we obtain the A.R.L. ei Nia Pp" +P” (17) 


This scheme is a modification of that based on Rule 4 in which suspected fluctuations in 
quality are investigated in one direction at a time. In the special case where all the boun- 
daries are horizontal the rule can be stated in terms of maximum and minimum cumulative 
scores as for Rule 1, making application especially easy; even in the more general case accept- 
ance and rejection scores can be recorded beforehand (cf. Wald, 1947, p. 93), so that only 
little calculation is necessary when carrying out the rule. The probability that the scheme 
ends in the H; region is (1—P")/(1-—P" +P’), (18) 


so that curves showing the probability that the scheme will indicate the correct direction 
of the change may be computed from the 0.0.’s of the composing Wald tests. 


Sn A Stop sampling 







Hy; 


Continue sampling 


Stop 





H, sampling Number of observations 


° Continue sampling 








Stop sampling 
Fig. 2. A two-sided sequential test. © O sample path ending in H,. x x sample path ending in H,. 


4. MISCELLANEOUS REMARKS ON PROCESS INSPECTION 
4:1. Modifications of the rules 
When one of the above rules has indicated a change in the quality of the output it may be 
desired to confirm the suspicions by an increased rate of testing. A possible extension to 
Rule 2 is as follows: 
Inspect a fraction f of the output and apply Rule 2 until condition (5) is satisfied; then 
inspect a fraction f* until the cumulative score S,, rises a further h, or falls h,. In the former 


case, take rectifying action, and in the latter, resume inspecting a fraction f and reapply 
Rule 2. 
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Another requirement may be to inspect the run of output that leads to the action being 
taken; this would entail holding production in ‘bond’ until it became clear that it was 
satisfactory. For example, with Rule 2, the output would be in bond until the current S,, 
was the minimum; the average amount held would be N,(0)/f while production continued 
and N,(0)/f when the action was taken. 

In practice it might be inconvenient to examine single articles at frequent intervals so 
that sequential inspection schemes (in the sense of § 2-1) will not be suitable. In such a case 
a scheme of the transition type may be applied and the size of sample chosen as small as 
conveniently possible. An alternative procedure is to inspect a fixed number of articles on 
each occasion but to apply a sequential rule as each article is examined; a decision to take 
action will be delayed by an average time equivalent to about half the sample size. Another 
method of grouping is to inspect articles on each occasion until a specified number of the 
composing Wald tests have terminated or until action is required, whichever is the sooner. 
The fraction of output sampled would then be a function of the quality number and, for 
a given 0, could be adjusted by the choice of scheme. 


4:2. Repeated tests 

It has been shown how Rule 2 is equivalent to the repeated application of Wald tests with 
initial score on the acceptance boundary, and the 4.R.L. has been evaluated using that fact. 
Process inspection schemes can be obtained by repeatedly applying general Wald tests and 
their A.R.L.’s found in the same manner. When the quality is uniform the behaviour of such 
schemes is comparable with that of Rules 2 and 3. There is a difference, however, when 
a deterioration in quality occurs; it is not clear what is the effect on the a.R.L. should an 
abrupt change in @ take place in the middle of one of the tests. When the tests have initial 
score on the acceptance boundary we have that Z(Z) < L(0) for Rules 2 and 3 so that the 
average number of observations until the action is taken is certainly no greater than the 
A.R.L. of Rule 2 on the new quality. No such statement can be made for more general tests. 

Other process inspection schemes may be obtained by the repeated application of dif- 
ferent tests or sequences of tests; for example, the control chart method is that of repeated 
fixed sample size tests. 

4-3. Estimation problems 


If it is required to estimate 0 when a process inspection scheme has indicated a change, in 
practice it may be most convenient to take a further sample for this purpose. Alternatively, 
an estimator of sorts can be obtained by equating the number of observations in the rejec- 
tion test to the appropriate conditional a.s.n. function, N’(@), say. If there is no root of this 
equation the estimate can be 6*, where N’(6*) is the maximum of N’(@); if there are several 
roots, which will be the case in general, some rule such as ‘take the largest root’ can be used. 

Since a process inspection scheme eventually suggests a change in 0 whether one has 
occurred or not a rule for estimating the position of the change may be misleading. If it 
is known that a change has occurred somewhere within a set of n observations Lindley’s 
Method of Minimum Unlikelihood (1952) will provide an estimator. If it is further known 
that the change is from 0, to 6, a simple unbiased estimator, not necessarily integral or 


within the range (0,7), may be obtained from the meet of the mean paths from zero and 
n 
, ie. we tak " 
Um 1.e. we take nm (nb, Sx, 
1 


| | '0.-40. (19) 











US 


—s 
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In this case the maximum-likelihood estimator is the minimum (maximum if ,>6@,) of 
the sample path for Rule 1 when the scoring is that of the sequential-likelihood ratio test, 


f(x, 4) 
f(z, 4) ’ 


2, = log 





(20) 


5. PROCESS INSPECTION FOR FRACTIONS DEFECTIVE 
5-1. Average run length 
In the case where the articles produced are classified as defective or non-defective explicit 
formulae for the 0.c. and a.s.N. of the Wald test have been given by Burman (1946) and 
Walker (1950). Let the score for a non-defective article be —a and that for a defective b, 
where a and 6 are integers. The process inspection scheme is given by Rule 2. 

The difference equations for the 4.R.L. to which the integral equation (14) reduces in this 
case may be solved by a series transform method as in Walker’s paper. We shall, however, 
use relation (9) and Burman’s formulae in which a = 1 and it will be convenient to take the 
formulae in the limit as the fraction defective, p> 0, b->0o and pb = X. It is necessary to 
modify these formulae, which do not hold on the boundary, to obtain P(0), N(0). We have, 
by taking expectations conditional upon the first observation, 


P(0) 
and N(0) 
In the limit the a.R.L. is given by 


L(0) _1+XN(b)/b 
6 X{1—P(b)]° 


q+ pP(b) (21) 
1+pN(b). (22) 


(23) 


5:2. Choice of scheme 

If the sampling fraction is constant, the A.R.L. is proportional to the average number of 
articles produced before the inspection scheme causes action to be taken. In some applica- 
tions it will be more important that stoppages when the quality is good be very infrequent 
while a fairly lengthy run of poor quality can be tolerated; in other cases a rapid action for 
bad quality is of prime importance. Sequential schemes to meet these requirements can be 
selected; e.g. for the first situation we choose the sequential scheme with specified A.R.L. 
on 9, (good quality) and minimum A.R.L. on @, (bad quality). From (23), pL(0) is a function 
of X = pb; consequently, for each h we can find X, and hence b, such that the a.R.L. is Dg 
when the fraction defective is py. The A.R.L. function for a few values of h may be calculated 
using the tables of Wald schemes given by Anscombe (1949); for the selection of schemes to 
the above requirements more extensive tables are needed ; these are shown in the Appendix. { 
pL(0) is tabulated as a function of X for h/b = 1-25 (0-25) 5-00. For h/b< 1, pL(0)=1. Asan 
example, the scheme with h/b = 2:75 and a.R.L. 4000 at py = 0-02 has po L(0) = 80 at 
X=+0-35. Hence the required scoring is b = 17-5. The a.R.L. at p = 0-05 is given by 
p, L(0) = 11-6, ie. L(0) = 232 from the value X, = 0-875, In order to select schemes 

+ With this scoring a rise in the mean sample path corresponds to a rise in the fraction defective. 
Those who prefer to think in terms of a fall in the standard of production will wish to reverse the signs 
of the scores. 7 


{ I am indebted to the Director of the Mathematical Laboratory, University of Cambridge, for 
permission to use EDSAC for the preparation of these tables. 
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with given A.R.L. at p) and maximum or minimum A.R.L. at p, the schemes for different h/b 
can be compared and the appropriate value chosen. 

It has been remarked (see § 2-2) that the process inspection scheme may be carried 
out as tests (Wald, 1947); it will, however, probably be found more convenient to record the 
cumulative score S, on a chart and take action whenever S,—minS;>h. As only the 
difference between S,, and the minimum is of importance only this information need be 
entered on a new chart when the old one is completed. If the old chart is withdrawn when 
the current cumulative score is the minimum the new chart may be started afresh. 


5:3. Comparison of sequential and simple sampling schemes 


The single sampling scheme in which a fixed number, JN, of articles is inspected on each 
occasion and action is taken if c or more defectives are observed has a.R.L. L*, where 


L*=N > (7) pign—. (24) 
i=c 

For p small and N large the Poisson approximation may be used; Molina’s tables (1947) 
will be found useful for selecting the best scheme of this type (Page, 1954). In Table 1 
the a.R.L. function is given for two values of the argument, p, for five single sampling 
schemes and in Table 2 the corresponding quantities for five sequential schemes taken 
from the Appendix table; the first three schemes have specified a.R.L. at p = 0-01 and 
minimum A.R.L. at p = 0-03 and the last two specified a.R.L. at p = 0-03 and maximum 
A.R.L. at p = 0-01. 

















Table 1. Single sampling schemes Table 2. Sequential schemes 
N c ZL*(0-01) | Z*(0-03) | h/b b L(0-01) L(0-03) 
63 3 2,500 214 2-75 60 2,500 180 
103 4 5,000 275 | 375 65 5,000 220 
— "| 8,000 321 | 425 70 8,000 263 
70 ie 2,050 200 | 3-00 55 3,600 200 
165 5 | 6,300 300 | 375 | 50 13,000 300 























For the first three schemes L(p=0-03)=0-8L*(p=0-03), while the difference in the 
A.R.L.’s at p = 0-01 for the last two is proportionately rather more. 

It will, however, be noticed that if a deterioration in quality much worse than p = 0-03 
occurs the a.R.L. of the single sampling schemes is at least equal to the sample size while 
the lower limit of the a.R.L. of the sequential schemes is the largest integer less than R+ 1. 
Accordingly the sequential schemes have the advantages of single sampling schemes using 
small samples for the detection of large changes in quality while retaining a desirable a.R.L. 
for small changes. These two requirements are in conflict for the choice of the best sample 
size for single sampling schemes; the best sample size becomes large as the magnitude of 
the change decreases. 

The binomial process inspection scheme may be applied to continuous experiments 
designed to increase the rate of occurrence of a desirable but rare property. An analogous 
problem has been considered by Fisher (1952). 
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6. APPLICATIONS TO STORAGE PROBLEMS 
6-1. General remarks 
Storage and queuing problems have received attention in a number of recent papers (e.g. 
Kendall, 1951; Lindley, 1952; Smith, 1953). Many of the problems considered in these 
papers have placed no restriction on the capacity or content of the store except that the 
system should be in the equilibrium state where the distribution of input and output are 
such that the content of the store does not increase without limit. In this section some 
problems concerned with stores of finite capacity will be discussed briefly. 


6-2. A storage problem 
Consider a store of total capacity A; at certain times (‘movement epochs’) goods of 
amount x are sent out and, at the same time, an amount y is received into the store. If the 
previous contents of the store are insufficient to meet the demand 2, the deficiency is made 
up as far as possible from the new input. If the store would be left overfilled after a movement 


epoch some of the input is rejected so that the store is left just filled. Let S, be the store 
content after the nth movement epoch. 


S =Z (25) 
oe > 


where Z is the initial content of the store. 

We are interested in the occasions when the store is empty and the demand cannot be 
satisfied; that is to say, when S,, becomes negative. The situation is that of the process 
inspection Rule 3 to detect a decrease in the quality number when the initial score is Z. 
The A.R.L. is the average number of movement epochs before the store becomes empty. 


We have L(Z) = N(Z)+{1—P(Z)} Lh). (26) 
The probability that the store becomes empty before becoming full is P(Z), and the 


expected number of movement epochs between successive epochs of shortage (i.e. when the 
store is empty) is N,(0). The number of consecutive epochs of shortage is a geometric variable 


Then S, = min (h,S,.+y- = 


0 
with parameter 7 -{ f(z) dz, where f(z) is the frequency function of z = y—z. 


If any unsatisfied demand is allowed to be supplied out of succeeding inputs, so that, for 
S,<0, |S, | is the total demand outstanding, the average number of movement epochs 
until the store becomes full is the 4.s.n. of the Wald test with a single boundary at h and 
initial seore Z. 

6-3. Stores with a steady drain 


Many problems of practical interest involve stores from which there is a steady output, 
e.g. storage of fuel for blast furnaces (this has been the subject of some at present unpublished 
work by H. Herne and D. G. Nikolls). If the store is allowed to accept input whenever the 
content of the store is <h (even if the input causes the new total content to exceed h) but 
must reject otherwise, the expected time to an overflow can be written down as before. To 
find the expected time to the exhaustion of the store we need, in general, to create a regener- 
ation point after the store has overflowed; for example, we can insert the condition that the 
next input after an overflow shall arrive when the content is Z). The previous methods 
may then be applied. 
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When the upper limit, h, of the store may never be exceeded the problem is more com- 
plicated. The net increment between inputs is no longer sufficient to describe the behaviour 
of the store content. In the Wald test it is necessary to take two separate steps, first the 
amount of input and then the drain. The simple theory can be applied in some special cases. 
If the amount of input on each occasion is fixed (and equal to unity, say) and the drain 
between input epochs is a random variable, y, a Wald test with boundaries (0,h—1) and 
increments 1 —y may be used to construct the theory as before. Similarly, a Wald test with 
boundaries (1,/) and increments x— 1 may be used when there are inputs of amount z at 
unit time intervals. When the random variable has the exponential distribution an explicit 
solution for the characteristics of the Wald test is available (Anscombe & Page, 1954). 


7. DEFERRED SENTENCING SCHEMES 
7-1. General remarks 


Process inspection schemes are designed to detect variations in the quality of output as 
it is being produced so that faults in the process may be traced promptly. In some cases 
it is impracticable to apply such a scheme, but after production it is necessary that runs of 
good and bad quality be separated, i.e. the output must be sentenced. Different action 
will then be taken with the two qualities. In the original formulation the problem is that of 
identifying the subsamples with the different parameter values. Some deferred sentencing 
schemes for articles classed as defective or not have been given by Anscombe et al. (1947); 
in these schemes articles are sampled and tested one by one and rules for the rejection of 
output are based on the occurrence of d defectives in any sequence of N articles examined. 
Roughly speaking, these moving average rules accept the output until a deterioration in 
quality is noticed and then reject until an improvement appears. Similar schemes can be 
constructed from process inspection scheines. 


7-2. Repeated process inspection schemes 

When a deterioration in quality is shown by a deviation in one direction (an increase, 
say) of the quality number, a deferred sentencing scheme may be obtained by applying 
two process inspection schemes consecutively. First, apply Rule 2 to detect an increase 
in 6; let 1 be the position of the minimum S; when the rule operates. Then apply Rule 2 to 
detect a decrease in 0; let m be the position of the maximum S;. Reject articles 1+ 1 and m 
and all output within those limits. Because there is probability one that the process in- 
spection scheme operates even if the quality, 0, remains constant some of the output will 
be rejected by this deferred sentencing scheme whatever the value of 0, satisfactory or not. 
In the notation of § 2-2 with P*(Z), etc., written for the characteristics of the Wald test 
with boundaries (—k, 0), an estimate of the proportion of output rejected is 
{l- P*(0)} 1¥(0) 

P*(0) 
L(0) + L*(0) 


N,(0) + 





(27) 


Two-sided deferred sentencing schemes may be obtained in a similar manner. For 
example, a scheme could be: 

Apply Rule 5 until it operates; suppose that it does so because of a hit on the H,(H;) 
boundary (thus suggesting a decrease (increase) in #). Then apply Rule 1 to detect an 
increase (decrease) in 0; when it operates repeat the cycle. 





—— ———E ed 
— - ~~ — — ~ 
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The schemes suggested above may be regarded as a succession of Wald tests with the rule 


that all output leading to a rejection test is rejected. Other schemes may be constructed 
using different sequences of tests. 


8. RECTIFYING SCHEMES 
8-1. General remarks 


The main purpose of rectifying inspection is to check that the output maintains a given 
standard and to improve the quality by replacing defective articles if necessary. Often it is 
require to ensure that the average quality of the product after inspection (the average out- 
going quality: 4.0.9.) is better than some specified standard whatever be the quality before 
inspection. We suppose that articles are either defective or non-defective. A fraction 
(possibly large) of the output is inspected and any defective articles are rectified or replaced 
by non-defectives. Several schemes for the rectifying inspection of continuous output have 
been suggested, notably those of Dodge (1943) and Wald & Wolfowitz (1945). In these 
schemes a fraction, f <1, of the output is inspected until there are signs that the quality is 
not as good as that required; 100% inspection is then started and continued until an im- 
provement is noticed, when the cycle is repeated. We consider schemes in which the signs 
of improvement or deterioration are based on the results of tests repeatedly applied. 


8-2. Repeated tests 

Suppose that we are given two statistical tests, 7’ and 7'*, for the quality of the production 
and that each has two possible results, acceptance and rejection. Then we consider the rule: 

‘Inspect a fraction f of the output and apply test 7’: if it accepts, reapply 7’; if it rejects, 
start 100 % inspection and apply 7'*. If 7'* rejects, continue 100 % inspection and reapply 
7T*; if it accepts, inspect a fraction f and restart the cycle of tests.’ 

When the production process is under control so that the proportion of defectives, p, 
produced remains constant we can express the 4.0.Q. in terms of the a.R.L.’s Z and L*, of 
the process inspection schemes obtained by repeatedly applying 7' and 7'* respectively. 


We h 
6 ae A.0.Q. = pl(i—f) (28) 
0.0. = TALS 
The fraction of production inspected, I(p), is given by 
_ f(L+L*) 


More complex rules can be formulated in which more than two tests are used or in which 
succeeding tests depend upon characteristics of preceding ones. Tests allowing more than 
two decisions may be used to cater tor multi-dimensional inspection. 


8-3. Special cases 
The scheme suggested by Dodge is: 
‘Inspect a fraction f of the output until a defective occurs; then do 100 % inspection until 
a run of i non-defectives is observed.’ In this case the test 7’ may be taken as that based 
on a fixed sample size of one article with acceptance for a non-defective and rejection for 


a defective. 7'* is a binomial sequential test with boundaries (0,7), starting score i and 
penalty for a defective >7. 


Biometrika 41 8 
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This scheme bases its decision to start 100 % inspection on the appearance of one defec- 
tive only; if this treatment of a first offender seems harsh it may be preferred to use for 7' 
a sequential test with initial score zero and boundaries (0,h), where h is greater than the 
penalty for a defective. 

Schemes may be obtained using Wald tests with general starting scores, but disadvan- 
tages similar to those given for process inspection schemes are present when the quality is 
not constant; these same disadvantages will app!y in greater or less degree to the use of all 
tests with an A.s.N. much greater than one. 

For the sampling plan A of Wald and Wolfowitz T is a sequential test with only a rejection 
boundary and 7™ a fixed size test with only an acceptance criterion. The position of the 
boundary in the next test 7’ and the number of observations in 7'* depend on the overlap 
on the boundary in the preceding test. 


I wish to express my thanks to Mr F.. J. Anscombe, who suggested this subject of research 
and supervised it, and to Dr D. R. Cox, Mr D. V. Lindley and Dr J. Wishart: for reading the 
draft of the paper. I also wish to thank the Department of Scientific and Industrial 
Research for the award of a maintenance grant. 
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ON NAHORDNUNG AND FERNORDNUNG IN SAMPLES OF 
LITERARY TEXTS 


By WILHELM FUCKS 
Physikalisches Institut, Aachen 


1. INTRODUCTION 


In a previous paper (Fucks, 1952, cited as I), several possibilities were outlined to describe 
the style of samples of literary texts by means of mathematical characteristics. The com- 
puting of practical examples, however, was confined to the most simple ones of these style 
characteristics. Most of them were derived from the frequency distributions of the pro- 
perties of elementary units of the text themselves. It was, however, pointed out briefly, 
that the properties of groups of elementary units of the text can also be used to define 
characteristic numbers and, as an example, the values of an /,,-matrix were given. 

To explain the aim of this paper, we assume the text to be divided into elements. The 
elements will be denoted by the numbers 1, 2,..., A. Itis perfectly arbitrary what parts of 
the text are chosen as elements: syllables, words, parts of the phrases, whole phrases, 
chapters, etc., or substantives, adjectives, verbs, etc., or cases of substantives, tenses of 
verbs, etc., or metric elements, and so on. The elements are to be considered as characterized 
by different marks, one or several of them for each element. These marks (or properties) 
can be chosen, within certain limits, in an arbitrary manner. However, to clarify our con- 
ception with the help of a special example, we shall generally in this paper, as in paper I, 
choose the words of a text as elements and the numbers of the syllables of the words as the 
interesting properties of these elements. 

Consequently, we shall now assume that the text is broken down into the words; and that 
these have been mixed together and then picked up, one after the other, perfectly at random 
and given the numbers 1, 2,..., A. Theoretically, if we should repeat this procedure often 
enough, we should obtain the natural order of the original text as one of the different 
arrangements arising from picking up the words at random. We conclude, therefore, that 
we do not restrict the generality of our conclusions, if we start by marking the words of the 
text in their original order by the numbers 1, 2,..., A. 

Now, let us assume that we have computed the values of p,, the proportion of the A words 
of a given text having i syllables and also the various characteristic numbers calculated in I, 
namely, 7, the mean number of syllables per word; S = —k > p; log p,, the ‘entropy’; and 

+ 


the ‘trace’, s. Considering these results, we ask how much of the structure of the text is 
described in an exact manner by these characteristic numbers. The answer is that our 
characteristic numbers do not tell us much more of the text than the vapour of a mixture of 
several fluids tells us about the fluid-structures proper, i.e. the real structure of the text 
does not enter into the characteristic numbers so far studied in detail. What we have taken 
into account with the characteristic numbers actually computed in I is the choice of a certain 
number, z, of words out of the reservoir V of the words of the language in which the text 
is written. The words enter into our characteristics as formed grammatically. But the gram- 
matical forming process is contained in our characteristic numbers only in a rudimentary 
way, i.e. only in so far as the numbers of the syllables of the words are affected. 





RS 
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It may be regarded as rather noteworthy, that so much can be achieved in characterizing 
the style of an author, starting our calculus with such scanty material. We have, indeed, 
until now not used the various moments of the frequency distributions computed in I. 
In the meantime the moments of the orders two to four and other higher characteristic 
numbers of the distributions discussed in I have been calculated and the results will be 
published in another paper. 

All the characteristic numbers calculated from properties of the fundamental text 
elements themselves will hence be called characteristic numbers of the first order. If we 
calculate, however, a characteristic number using simultaneously the properties of groups 
of 2,3,...,2 fundamental elements at 2,3,...,” different positions in the text, we will 
describe those characteristic numbers as of the second, third, ..., nth order. 

Besides these distinctions another distinction is important. If the groups of elements 
considered for the computing of a characteristic number of an order higher than one are 
arranged in the original text in the immediate neighbourhood of one another, we speak of 
Nahordnung; if not, we speak of Fernordnung. It will be shown that it is advisable to define 
different types of characteristic numbers for both these cases. Certain characteristic 
numbers useful for the study of groups of consecutive elements (Nahordnung) prove either 
mathematically or practically uninteresting when applied to the study of the relations 
betweeii the properties of distant parts of the text (Fernordnung) and vice versa. 

The distinction between Nahordnung and Fernordnung is perfectly unique when applied 
to characteristic numbers of the second order; applied to characteristic numbers of an order 
higher than two it is not. There are, obviously, characteristic numbers of an order higher 
than two that are pure Nahordnungs-eases, and, on the other hand, characteristic numbers 
of an order higher than two that are pure Fernordnungs-cases, but there exist mixed cases 
as well. 

The aim of the present paper is to define characteristic numbers describing Nahordnungs- 
and Fernordnungs-relationships of any order. The calculation of examples of Nahordnungs- 
and Fernordnungs-characteristic numbers, however, will be confined to the second order 
exclusively. 


2. NAHORDNUNG 


A sample of a text may be given as follows: 


(*) () (‘s) Ave C4 (*) (1) de (7) 

fi) \fo! \fa} «+» \fi-al \Pi! fara] --- Pn)’ 

the elements ¢,,¢.,... having certain properties f,, f,,... respectively. The e; may be, for 
instance, assumed to be the words and the /; to be the numbers of syllables of the words. 


We now break up the text into pairs, triplets, ..., n-tuplets of elements as indicated in the 
following lines (a), (6), (c), ete.: 


(a). |.€y. eg [ey eg | Cy Cg | Ce Cpl Op Crp | eax Cag! | -oe5 
(6) |e 2 es | ey € eg |e &% &% | G19 &11 %2 | ++ 
(c) |e, C2 3 Cy | & & & & |e) 19 @ eg | ++ 


and consider the properties connected with the first, the second, the third, etc., groups in 
each line. The arrangement of the groups one after the other may be supposed, for the 
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moment, to be that of the original text. Thus we find sets of properties bound together by 
their being found in 2, 3, ... consecutive elements of the text, thus: 


forn=2: fi, fear; 


forn=3: fi, fis Seas; 


forn=k: fi, fis a T.. ORS. 


We now compute the relative frequency distributions of the n-tuplets for the whole of 
the given text, defining thereby matrices of the order n: 


Pik» Pity +> Pittuin 


These matrices, and certain numbers that will be derived from them afterwards, prove to 
be most appropriate characteristic numbers to describe relations between the properties 
of groups of neighbouring elements. 

It will be noticed that the splitting up process of the text into groups as described above 
is not unique. With binary groups there are two possibilities: 


| ey eg | ey eg | ey eg | ey... amd e, | eg eg | & es | ee | .... 


Generally, with groups consisting of n elements, there are n different possibilities. If the 
text is written in prose, there will, in most cases, be no difference with regard to the style 
characteristics if we break up the text in any of these ways. With a text written in verse, 
however, there may be a remarkable difference according to the phase of the beginning of 
the process of subdividing the text, especially if the properties of the elements entering into 
the characteristic numbers are metric ones. 

There are obviously additional possibilities for dividing the text into groups, if we admit 
combinations of the possibilities described so far. For binary groups, for instance, we can 
break up the text in both ways described above and calculate the p,,-matrix using the 
frequency distributions of both sets of groups (hence called distributions from the combined 
subdivision). 

We may ask, to what extent the matrices p,,__ take account of the very structure of the 
text. After forming the groups of n elements we may combine these groups and in order to 
perform the calculations described in the following chapters, we may pick up the groups 
perfectly at random one after the other and give them consecutive numbers 1, 2,...,A, 
without changing the matrices p,,, _ or the characteristic numbers derived from them. The 
amount to which the structure proper enters our style characteristics is obviously appro- 
priately described by the term Nahordnung of the text. 


2-1. The matrix py, of the frequency distribution of binary groups 
We assume a text consisting of A elements ¢,,¢,,...,¢, with the properties f,, f,, ...,f4, 
respectively. After applying the combined subdivision described above we compute the 
number A,, of the A—1 pairs of consecutive elements, that bear the properties f,; and f;,, 
respectively. In this way we find the distributions of the relative frequencies of the 
(t, k)-pairs: 
Pe=-7° (1) 


— 
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r by The following relations obviously hold for the p;,,-matrices: 
2 UPx = 1, (2) 
ik 
2 Pix = Py 2 Pit = Pr (3) 
With the words as elements and the number of the syllables per word as the properties, 
we find for some of the authors and works studied in I the following p,,-matrices: 
Table 1. p,,,-matrix for Shakespeare, Othello 
le of 1 2 3 4 | 5 
1 0-6137 0-1249 0-0392 0-0103 0-0007 
: 2 0-1246 0-0190 0-0074 0-0012 0-0002 
3 0-0386 0-0057 0-0025 0-0006 0-0001 
re to 4 90-0102 0-0015 0-0004 0-0004 | 0-0000 
rti 5 0-0010 0-0000 0-0000 0-0000 0-0000 
sa | | 
= i 
bove | Table 2. p,;,-matrix for Rilke, Cornet 
oe 5 4 5 
f the l 03936 0 | 0-1796 0-0396 0-0101 0-0004 
style 2 01808 02003 0-0186 0-0032 0-0000 
| 3 0-0376 0-0202 0-0020 0-0000 0-0000 
erse, | | 4 0-0089 0-0045 0-0000 0-0000 0-0000 
ag of 7 5 0-0004 | 0-0000 0-0000 0-0000 0-0000 
into | 
Table 3. p,,,-matrix for Hesse, Steppenwolf 
dmit 
>can | 1 2 3 . 5 6 7 
zthe } Ss eae 
ined 1 0-26556 0-16813 0-05732 0-02019 0-00550 0-00140 0-00023 
) 2 0-16331 0-09495 0-03596 0-01442 0-00324 0-00086 0-00009 
f th 3 0-06147 0-03272 0-01190 0-00446 0-00140 0-00023 0-00005 
© + 0-02068 0-01374 9-00469 0-00306 0-00054 0-00014 0-00000 
ler to 5 0-00518 0-00324 0-00158 0-00068 0-00014 0-00000 0-00000 
uns 6 0-00122 0-00104 0-00023 0-00009 0-00000 0-00000 0-00000 
; . \ 7 0-00018 0-00009 0-00000 0-00005 0-00009 0-00000 0-00000 
. The . r j 
ppro- Table 4. ,,,-matrix for Mann, Buddenbrooks 
| 
1 2 + 5 | 6 7 8 
1 | 0-2616 | 0-1620 | 0-0586 | 0-0238 | 0-0055 | 0-0012 | 0-0002 | 06-0001 
a 2 | 0-1664 | 0-0982 | 0-0318 | 0-0151 | 0-0034 | 0-0008 | 0-0002 | 40-0001 
= te 3 | 00543 | 00355 | 0-0119 | 0-0056 | 0-0013 | 0:0002 | 0-0000 | 0-0001 
4 | 00239 | 0-0157 | 0-0042 | 0-0026 | 0-0010 | 0-0002 | 0-0001 | 0-0000 
nd f,; 5 | 0-0056 | 0-0037 | 0-0011 | 0-0007 | 0-0003 | 0-0001 | 0-0000 | 0-0000 
£ the 6 | 00013 | 0-0008 | 0-0001 | 0-0001 | 0-0001 | 0-0000 | 0:0000 | 0-0000 
7 | 00002 | 0-0001 | 0-0001 | 0-0001 | ‘0-0000 | 0-0000 | 0-0000 | 0-0000 
8 | 0-0002 | 00001 | 00000 | 0-0000 | 0-0000 | 0-0000 | 0-0000 | 0-000 
(1) 
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Table 5. p,,-matrix for Mann, Zauberberg 
























































l 
1 "ae Sar Seri Bey “at ee as ee 8 
| | | 
Bo: we ju | | | 
1 0-2643 0-1618 0-0590 0-0256 0-0077 | 0-0020 | 0:0007 | 0-0001 
2 0-1608 0-0792 00348 | 0-0142 0-0041 0-0010 | 0-0002 0-0001 
3 0-0607 0-0355 | 0-0127 0:0059 0-0012 0-0003 0-6000 0-0001 
4 0-0256 00135 | 0-0065 | 0-0032 0-0007 0-0001 | 0-0001 0-0000 
5 0-0066 0-0042 0-0023 0-0006 | 0-0002 0-0000 0-0000 0-0000 
6 0-0019 0-0007 | 0-0006 0-0002 00000 | 0-0000 | 0-0000 0-0000 
7 0-0003 0-0003 | 0-0002 | 0-0002 0-0000 0-0000 | 0-0000 0-0000 
8 0-0001 0-0001 | 0-0000 0-0001 0-0000 | 0-0000 0-0000 0-0000 
+ | me 23 
Table 6. p,,-matrix for Carossa, Geheimnisse des reifen Lebens 
| ] ] heer l ’ 
| | 1 LOM. iF RD, A at S. thoooe | 7 
| | | } 
nae pees mT & ay th 
| 1 0-24200 0-16840 0-06492 | 0-02017 | 0-00557 | 0-00040 | 0-00018 
2 0-16857 0-09502 0-03673 0-01509 0-00374 0-00076 | 0-00022 
| 3 | 0-06211 | 0-03745 001305 | 0-00588 | 0-00169 | 0-00027 | 0-00004 
| 4 | 0-02213 | 0-01492 | 0-00387 | 0-00187 | 0-00058 0-00004 | 0-00000 
| 5 | 0-00646 | 0-00365 | 0-00134 | -0-00045 | 0-00013 | 0-00000 | 0-00000 
| 6 | 0-00094 | 0-00045 0-00022 | 0-00009 0-00004 | 0-00000 0-00000 
| ' 0-00018 | 0-00018 0-00004 | 0:00000 | 0-00004 | 0-00000 | 0-00000 
} 
Table 7. p,,-matrix for Jaspers, Der philoscphische Glaube 
| 








1 2 | 3 4 6 | : 8 9 
| | 
|— 


] 
| 
atte 
| 
| 








0-00000 | 0-00000 | 0-00000 | 0-00000 | 0-00000 
0-00003 | 0-00000 


1 | 0-23877 | 0-13408 | 0-06872 | 0-04215 | 0-01824 | 0-00463 | 0-00078 | 0-00014 | 0-00009 
2 | 013868 | 0-06090 | 0-02758 | 0-01609 | 0-00689 | 0-00141 | 0-00017 | 0-00023 | 0-00000 
3 | 0-06950 | 0-02822 | 0-01273 0-00793 | 0-00282 | 0-00072 | 0-00011 | 0-00003 | 0-00003 
4 | 0-03962 | 0-01626 | 0-00954 | 0-00523 | 0-00267 | 0-00049 | 0-00016 | 0-00006 | 0-00003 
5 | 0-01494 | 0-00888 | 0-00422 | 0-00259 | 0-00149 | 0-00049 | 0-00003 | 0-00003 | 0-00000 
6 | 0-00385 | 0-00198 | 0-00124 | 0-00046 | 0-00029 | 0-00014 | 0-00003 | 0-00003 | 0-00000 
7 
8 
9 


0-00009 | 0-00000 | 0-00003 | 


| 


A closer study of the matrices shows remarkable differences with regard to the different 
works from which they are derived. Certain characteristic factors, which bring out more 
clearly the differences of the style of the different works, will be defined and computed in 
the following sections. 


0-00000 








| 

0-00029 | 0-00003 | 0-00003 | 0-00000 
0-00000 | 0-00000 | 0-00000 
| | 


| 
| 











} 
| 
| 
0-00060 | 0-00032 | 0-00026 | 0-00003 | 0-00000 | 0-00000 | 000000 | 0-00000 | 0-00000 





2-2. The moments of the first order of the p,;,-matrices 
We study the moments Mj and M7: 


Mi = LEP My = Lhe (4) 


If we consider the p,,-distribution as a distribution of masses in a plane co-ordinate system 
with the co-ordinates i and k, the values of Mj and Mj give the co-ordinates i* and k* of the 


OOO WwWwo oS 


oo 
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mass centre of this mass distribution. With the words as elements and the number of syllables 
per word as properties, there holds the relation } ip; = }ip;,;. Hence we obtain 
i i 


, mW = 
MM, = HM, =i. 


From the matrices given in Tables 1-7 we compute Table 8. 


Table 8. Mean values of the p;,.-matrix 








Author Work M,=M{ 
Shakespeare Othello 1-2943 
Rilke Cornet 1-4506 
Hesse Steppenwolf 1-7245 
Mann Buddenbrooks 1-7376 
Mann Zauberberg 1-7567 
Carossa Geheimnisse 1-7494 
Jaspers Der phil. Glaube 1-8871 

















There are, however, cases in which the relation mentioned above does not hold, e.g. 
Piz = Telative frequencies of words per sentence and syllables per sentence. Characteristics 
derived in this way will be described in another publication. 


2:3. The variance 


The study of the higher moments of the p,;,-matrices will be confined in this paper to 
the moments of the second order, i.e. the variance M,. This moment is given by asymmetrical 
matrix with three independent matrix elements. If we consider the p;,-matrix as described 
before as a mass distribution in a plane (i, &)-co-ordinate system, the three elements of the 
M,-matrix are the two moments of inertia o,, and o,,. with respect to the two axes 1 = 1* 
and k = k*, respectively, and the product of inertia 7,, = 4): 


im “ as | (5) 
Tu Tu 
with 01, = ~ * ((— My)? pix, | 
To = yd (e— Mi) Paw ; (6) 
O13 = On, = LE (i— M;) (k— M}) Dix- 


In the special case with Sip, = S ip,; we obtain 
i i 
Tu = Fx = Lp, -?. 
i 


The examples of texts, studied in this paper, lead to the values for the matrix M, of 
Table 9. In computing these values there arise small differences between o,, and 03. due 
to errors connected with the establishing of the matrices. In Table 9 the mean value of the 
calculated o,, and 2. values is given. 
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Table 9. Variance of the p,,-distributions 
Shakespeare, Othello 


M = ( 0-3911 =) 
: —0-0117 0-3911 
Rilke, Cornet 
M.= 0-4543 or 
, —0-0076 0-4543 
Hesse, Steppenwolf 
M = ( 0-8736 a 
2 \+0-0110 0-8736 


Mann, Buddenbrooks 
M = ( 0-9104 Sealand 
*~ \+0-0094 00-9104 
Mann, Zauberberg 
( 0-9881 Soa add | 
+0-0017 0-9881/ 
Carossa, Geheimnisse des reifen Lebens 
M = ( 0:8774 Sime 
#" \—0-0133 0-8774 
Jaspers, Der philosophische Glaube 
M =( 1-4016 nt) 
2" \—0-0912 1-4016 


M,= 


We transform the M,-matrix on principal axes and we find the principal variances o; 
and o;,;. In our mass distribution they are to be compared with the principal moments of 
inertia. We obtain their values as the solutions of the quadratic equation in A: 

Oy—A T12 | 
= 0. 7 
| To, = Fag A (7) 
The results for our examples of text are given in Table 10. 


Table 10. Principal variances of the p,,,-distributions 














Author Work CO; On 
Shakespeare Othello 0-3724 0-4028 
Rilke Cornet 0-4467 0-4619 
Hesse Steppenwolf 0-8846 0-8626 
Mann Buddenbrooks | 0-9109 0-9099 
Mann Zauberberg 0-9864 0-9898 
Carossa Geheimnisse | 0-8907 0-8641 
Jaspers Der phil. Glaube | 1-3104 1-4928 











In Fig. 1 we plot o,, against o;. 


2-4. The correlation in the region of Nahordnung 


We compute, using the M,-matrix, the correlation coefficient c: 


12 
c = ————_... 8 
V (711% 29) ®) 
The results are given in Table 11. 


(8) 
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Table 11. Correlation coefficients of the p,,-matrices 





















































Author Work cx 10? 
Shakespeare Othello — 2-992 
Rilke Cornet — 1-673 
Hesse Steppenwolf + 1-259 
Mann Buddenbrooks + 1-033 
Mann Zauberberg +0°174 
Carossa Geheimnisse — 1-519 
Jaspers Der phil. Glaube — 6-507 
15 - 

Jaspers 
ou 
1:0 
2} Mann (Z.) 
Hesse Mann (B.) 
1 
/ Carossa 
05 
Rilke 
| 
Shakespeare 
l 
0-5 1-0 15 


i 
Fig. 1. Principal variances of the p,;,-distributions. 


The sign of the correlation coefficient c is determined by the sign of o,,. and indicates the 
orientation of the correlation ellipse relative to the (i, k)-co-ordinate system. A positive 
sign of c means that a certain value of i is preferentially followed by a value of k, which is 
equal or nearly equal to i, and therefore indicates a more ‘balanced’ character of the text. 
A negative sign means that é is preferentially followed by a value of k more different from i, 
thus indicating a more ‘rugged’ structure of the text. As all our correlation ellipses are 
very nearly circles, it will be seen that these trends are, in our examples, very slight ones. 
Nevertheless, they are very interesting having regard to the nature of our problem. 


2-5. A measure for the skewness of the matrix pj, 


The matrices p,, of our examples (Tables 1-7) are not symmetrical ones, p,, not being 
equal to p,,. The frequency with which (i, k)-pairs are formed is not the same as the frequency 
with which (k,1)-pairs are formed. 
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We divide the p,;,-matrices in a well-known manner into a symmetrical and an anti- 


symmetrical matrix: 
_ (Pik t+ Pri Pik — Pri 
Pik zz 2 + 2 > (9) 





and we consider the mean-square value of the deviation from the symmetry, £_: 


£_ = VEX (Pin Pea}- (10) 
Correspondingly we define the values £, : 

§+ = V2 (Pix + Pxi)*} (11) 
and f= VEX (Pax + Pyi)?} = 2 V(X Pi): (12) 


The latter value is characteristic of the ensemble of the diagonal elements of the p,;,-matrix 
and shows, therefore, a certain relationship to the trace s: 


8= Du. (13) 
The calculation of the £_- and £,-values and of the trace s gives results shown in Table 12. 


Table 12. Characteristic factors derived from the p;;,-matrix 

















| Author Work | £_ x10? | gi. | 8 px 10? 

| a a = eis seis ll, st send ae ee 
Shakespeare Othello | 0-585 | 0-668 0-6357 0-876 
Rilke “Ly net | 0330 | 0-550 0-4959 0-600 
Hesse | Steppenwolf 0-721 0-459 |  0-3756 1-569 

| Mann |  Buddenbrooks 0-765 0-339 0-3764 2-258 

| Mann |  Zauberberg 0-274 | 0-357 0-3598 0-770 
Carossa | Geheimnisse 0-422 0-372 0-3521 1-134 

, Jaspers Der phil. Glaube 0-703 0-349 0-3193 2-014 

| 








We plot £_ against £, in Fig. 2, and define another interesting number p (hence called 
relative skewness factor of the matrix p,,) as 
p = tany = 5. (14) 


The values for p are to be found in Table 12. 

We plot p against s in Fig. 3. 

In forming the factor £_ the sign of (;;,— p;,;) is lost. As an example we give a scheme of 
the signs of (p;;,— p,;) of Shakespeare’s Othello: 
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If we reverse the sense of the direction in which the original text is read, the i and k are 
reversed too and the matrix p,, is transformed into the transposed matrix p,, = p,,, and 
all the signs of the scheme are reversed also. 


&_x10 ——> 
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Fig. 3. p- against s-values of the p,,-matrices. 
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2-6. The entropy of a text 


We consider the text as a mixture of units. If these units are elements as studied so far, 
we define a mixing-entropy-factor S,, as given already in I, 


S, = —k> p; log p;. (15) 


In an analogy, we can compare this factor with the mixing-entropy for a mixture of 
several gases, each of which is distinguished from the other by having molecules constituted 
of a different number of atoms. 

Now, we assume the units constituting the text to be not the elements themselves but 
groups of elements as defined in the previous sections of this paper. Accordingly we define, 
for the special case of binary groups, an entropy-factor S,: 

S, = kD Pinlog Pre (16) 
t 

In an analogous process we construct, in the case of groups of a higher order, entropy- 

factors of a higher order S, =-kDD... DP we. nloerx..n- (17) 


For the entropy-factors S, we get the results shown in Table 13 (we repeat the results for 
S, from I in the same table). 


Table 13. Entropy values of the first and the second order 














Author : Work ae Bi at 
pt — 

Shakespeare | Othello | 0-297 | 0-593 
Rilke Cornet | 0-385 0-767 
Hesse Steppenwolf 0-499 | 0-974 
Mann Buddenbrooks 0-505 1-009 
Mann Zauberberg 0-516 | 1-030 
Carossa | Geheimnisse 0-501 1-009 
Jaspers Der phil. Glaube 0-567 | 1-126 

| 





Table 13 shows that S, is very nearly equal to 2S,. It can easily be shown that this 
relation holds exactly if the formation of the (i, k)-pairs is stochastically independent and 
that in the case of stochastical independency generally, S, = n.8S,. 

The definition of the S,-matrix is not very well suited to bring out the slight (but very 
significant) deviations of our special p,,-matrices from stochastical independency. The 
difference of the values S, for different literary works are drastic and significant, but not 
much more so than the differences of the values S,. The concept of the entropy S, will, 
however, become much more interesting, if the text elements and properties of text elements 
are studied which lead to p,,-matrices with very noticeable stochastical dependencies. 


3. FERNORDNUNG 


In the previous sections the Nahordnungs relations of a text were studied. We can, theo- 
retically, describe the relations within groups of any number n of elements by matrices 
Pim...» of the corresponding order. If we try, however, to compute, in practice, the character- 
istic matrices of the order three or four and to calculate the corresponding characteristic 





theo- 
itrices 
acter- 
eristic 
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factors, the amount of work increases enormously and .therefore the method becomes, 
outside the Nahordnungs region, practically uninteresting. To study the question, whether 
at all and to what extent exact relations between distant elements or groups of elements 
in a text exist, we have to look for other methods. 

First of all, we no longer break up the text, but we consider it as it is given by its author. 
Then, we regard sets of 2,3, ..., elements, which we assume no longer to be arranged in 
the immediate neighbourhood of one another; there can be (but not necessarily must be) 
other elements in between the 2, 3, ..., elements of a set. The distances / for two special 
marks (properties) ¢ and k, for instance, of two consecutive elements of one set may be 
called 1;,. 

Now we procure the frequency distribution of the /,, for the whole of the text. If the 
values for the properties i and k of the elements vary according to 


lsisI, 1sksi, 


there exist J? frequency distributions of this sort. Starting from these distributions, we 
derive characteristic matrices and characteristic factors, suitable to study the relationship 
between distant parts of a text. For an illustration of the concept of 1,;, of. § 3-2. 

Further, from another general point of view, we are going to study the question of possible 
relations between two or more elements, either part of them or all of them neighbouring or 
distant, using functions of influence and correlation coefficients. With the help of the latter 
we define a characteristic distance Z that can be compared, in a physical analogy, to the 
mean free paths of the molecules of a mixture of chemical elements or to the range of forces 
between the elementary particles of a solid body or a gas. 


3-1. Frequency distributions of element distances 


We study a text with the elements e. Let the elements be characterized by marks called 
i,k,.... Now consider two elements e; and e,, which may or may not be consecutive text 
elements, i and k being now special values out of the number J? of possible values for the 
(i, k)-combinations. The distance between the special elements e; and e, will be denoted 
by 1. 

Now there exists obviously a frequency distribution for the distances /,, defined in this 
special way. The number of frequency distributions of this kind amounts to J?, as shown 
before. 

The /,,-distributions altogether comprise the p,;,-matrices, studied in § 2, the /,,,-values 
of which being exclusively equal to 1. In this special case we will hence describe the frequency 
distributions p;, as 4p,,. 

In exactly the same way as we have studied the 1p,,-matrices in the earlier sections, 
we can calculate the characteristic factors (M, £, p,s, etc.) of the matrices *p;;,, *p,;,, ..., "Dx; 
etc., the /;,-values of which are 1, 2, ...,« respectively, i.e. the matrices 


“Dix = (Dizs Diks +++» 4 "Dix)s 
A being the total number of elements of the text. 
In doing this, we proceed, step by step, from the relations between the properties of 
consecutive elements to relations between the properties of increasingly distant elements. 
In this section we have, so far, only referred to matrices of the second order. In quite 
an analogous way it is possible to define matrices of a higher order, say n, with which there 
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exist, obviously, (n—1).(A—n+1) of the ‘a-possibilities’ just described. Therefore we 
have to write for these matrices of the 3, 4, ..., nth order 


i Oy On -1 Dig ns 


With the words as elements and the syllables as properties we expect 


1 , Pita...n Pi-Pk-Pi---Pn 
for increasing a-values. 


3-2. The matrix l,, of the mean distances and the matrix r,, of the 
reciprocal values of the mean distances 


We shall now define several expressions, derived from the “p,,-matrices, more interesting 
than the mass of the material contained in the A —1 matrices “p,,. We begin with defining 
matrices of the mean distances. 

To do this, the text offers various possibilities. If we denote the frequency distributions 
of the /,, by the letter H and the corresponding distribution of the relative frequencies by 


Ml) = Go, (18) 





we can define a mean value /,, for each combination (i, k) of properties according to 


lin = Dlyehy, with hy = h(l,). (19) 
liz 


This construction, however, is obviously much too clumsy to be suited for our purposes, 
especially as the values of H(1,,) do not decrease for increasing /,;,, so that the mean value 
L.,, a8 defined in (19), becomes somewhat trivial. 

Therefore we define another matrix /,, in the following way: Let us consider the first 
i-syllabled word of the text and then go along the text till we meet, for the first time, a word 
with k syllables. Let the distance between these two words be called 1,,. Then we consider 
the second i-syllabled word of the text and repeat the process and thus we find a second 
value /,,. To distinguish these two 1,,-values, we call them /9) and 1, respectively. Pro- 
ceeding in this way through the whole extent of the text we find a number, let us say z;,;,, 
of 1,,-values. The different /;,-values may be denoted generally as /?. We compute their 
mean value /,, 


ly, = =. (20) 





This matrix describes a set of values which is more suitable for the purposes of our 
problem than the /,, defined in (19). For obvious reasons we also consider the matrix r,, 
of the reciprocal values of the /,, 
(21) 


There are other possibilities for the construction of interesting types of distance-matrices, 
out of which we have considered, so far, only one. Another one can be explained with the 
help of the following example: 


aa awane «ss « e FT ) = we Mw a a me: heh Cw 
No. of syllables » RTRs ee 2 2 4 2 5 7 6 


~ 
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According to the definitions given in equation (20) we calculate 
6+8+54+44+2 24 
Le, aad al 


Le 
cd = 


Zon 5 6 





= 4-8. 


In this way the mark 7 in position 6 enters once, while the mark 7 in position 15 enters four 
times into our calculation. We now exclude all the /,,-values with the exception of the 
minimum values /,, with regard to each one of the numbers 7 in the two positions of our 
example; thus we obtain 





Whenever we calculate the distance matrices or the r,,-matrices in the latter way, we denote 
this by writing i’ in the summations and /',, and r/,, for the matrices. 
In Table 14 we give an example of a r;,-matrix for Rilke’s Cornet. 


Table 14. rj,-matrix for Rilke’s Cornet 








i 2 3 4 
1 0-6231 0-9251 0-6735 0-6275 
2 0-9391 0-2922 0-3212 0-2700 
3 0-6774 0-3410 0-0594 0-0697 
4 90-6600 0-2432 0-0729 0-0130 























3:3. Relations between the r,,-matrix and characteristic factors defined before 


As pointed out in I, there exist simple relations between the /,,-matrices, interpreted 
especially for the words as elements and the numbers of syllables per word as the properties 
of the elements, and the relative frequency distributions of the number of syllables per 
word as given in the examples of I. This holds obviously for the r;,-matrices too. 

If we consider, for instance, the one-syllabled words only, their mean distance apart in 
the text is given by the length of the whole text, equal to the number A — 1 of the sum total 
of the intervals between consecutive words, divided by the number A, of the one-syllabled 
words (if we neglect certain small regions at the beginning and the end of the text). Thus 





. A-1l 6 
Li= Z, (22) 
Therefore, the value of r,, becomes 
gh: ag bra j 
== tr COU — = seed 23 
shal Sea Es p(l+gt+git ) (23) 


Under the condition A > 1, which must obviously be fulfilled, we get generally 


1 
teed ale (24) 
li 
The diagonal terms of the 7,,,-matrix and also of the rj,-matrix are equal to the relative 


frequency distributions calculated in I. Therefore 
Lire = Uy = 1. (25) 
i i 


Biometrika 41 9 
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Since it is the mean values of the /,,, which enter into the matrices /,,, we may question 
whether or not there are relations between all the /;,-values and the relative frequency 
distributions p,; of I, i.e. beyond the very simple ones just pointed out, so that not only the 
l,, but also the l,,, for which i +k, could be calculated as functions of the p;. It is obviously 
possible to construct artificial types of texts, for instance, binary texts (texts with not more 
than two marks) with strict periodicities, which allow us to calculate all the elements of 
the /,,-matrix as functions of the relative frequencies p;. But it is easily seen that this cannot 
hold generally. 

The r,,-matrix cannot, it is true, be interpreted as a sort of a frequency distribution in 
the statistical sense of the word; we can, nevertheless, study the properties of the matrix by 
defining factors quite analogous to the factors we defined, for instance, for the 1p,;,-matrix 
of the first part of this paper. We may also regard the r,,-matrix as representing a mass 
distribution in a plane co-ordinate system, having 1* and k* as the co-ordinates of the 
mass centre: 


w= Vly, B= Vokra, (26) 
ih ih 


and with the moments @,, and 0, and the product 0,, of inertia as already described before. 
For our example in Table 14, we find the following values: 


4* 


13-893, 0,, = 924-38 


k* = 13-794, 0,5 = 946-61 


Bo = 950-61 
After the well-known transformation to principal axes, we get the principal moments of 
5 ay 6, = 1884-2, 04 =—9-21. 
We further compute the values £_,£,,£ and pin analogy to equations (10), (11), (12) and (14): 
£ = 4-899x 10-2, & = 1-382, 
£, = 2-771, p = 1°768 x 10-2, 
In analogy to S, defined by (16) we compute 
S,=-k> ~ rin, logrin, S_ = 1:6984. (27) 
i 


It is understood, however, that this factor S, cannot be interpreted as simply as the factor 
of equation (16) in § 2:6. 


3-4. Interaction between distant parts of a text and correlation coefficients 


Consider the elements of a text in their original order. Let the position number of the 
elements be called x and the properties of the elements be regarded as functions f(x). Now 
consider properties of the same sort at two different positions x, and 2, of the text, i.e. the 
functions f(x,) and f(z,). We assume that there exists an ‘influence’ between the functions 
at different places. The influence of f(z,) on the element in the place 2, will be formulated 
with the help of an influence function ¢(x,, 7,), as follows: 


P(%y; Ly) f(x). 


4): 


WILHELM Fucks 131 


Similarly, we expect an influence from elements in other places 22, ..., 2; of the text on the 
element in the position x,. If the sum total of all these influences (and nothing else) deter- 
mines the value of f(x;,), we may write 


F(x) = LX P(r; Xe) f(a), (28) 
ik 


which, in a calculus with continuous variables, amounts to an integral equation. 

We conclude that our characteristic numbers concerning Nahordnung and, still more, 
Fernordnung will turn out to be of decisive value for the exact description of the peculiarities 
of the style of a special literary work, if we can expect that a certain property of the text 
elements at a place x, influences the properties of elements at distant places. 

If such influences between distant text elements exist, we may speak, in the realm of 
statistics, of correlative relations, which can be formulated exactly by correlation coeffi- 
cients. One of these has been made use of in § 2-4. It is based on the moments of the second 
order of a frequency distribution: 

a an 
V (O11 %22) 





3-5. Application of correlation coefficients on literary texts and the definition 
of the characteristic distance L 
Consider, as in the previous section, a text element in the position i with the mark f; and 
a text element in the position k = i+/ with mark f,, and form the correlation coefficient ¢ 


fi-Susa 
q; 6? (29) 
MGB FB 
pe 4 1 4-2 — 1 4-l 
ee fhiua= 7a 2 fit R= qa ze iy (30) 


(with A = the number of elements of the whole text). The correlation coefficient q, is a 
function of the value of /. 


Now define a characteristic distance by 


1 
L= yy = U- (31) 


If there exist binding forces (influences) within the structure of a text, according to our 
concepts of Nahordnung and Fernordnung, we can make a distinction between short-range 
and long-range forces, comparable to the cohesion forces and the electric forces in physics, 
respectively. The value of the characteristic length L represents an over-all measure of 
the range of the binding forces of the whole of a text. This factor L, therefore, gives the 
exact answer to one of the most interesting questions of our problem, namely, whether 
there exist interactions (binding forces) within the structure of a given text, and if they do 
exist, of what range they are exactly. 

As pointed out before, it would not be of much interest to compute correlation coefficients 
or characteristic distances L with respect to properties of text elements, as to which we 
cannot expect any noticeable interaction between distant parts of the text. If, for instance, 
we should choose, in a prose text, the words as elements and their syllable numbers as their 


9-2 
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properties, the g,-values could not be expected to bring out especially characteristic dis- 
tinctions about the style of different authors. In this case the value of the distance L would 
turn out to be invariably in the neighbourhood of unity. It is easily seen that useful results 
can be expected only for elements and properties with respect to which a fairly strong 
influence between distant parts of the text is to be expected, i.e. for instance, for metric 
elements in a text, written in verse, or for a text written in prose, but (stochastically) 
interspersed with noticeable parts of poetic language. 

It will be shown, however, that the definition of style characteristics, especially suitable 
for texts of these types, requires a new approach. Therefore, non-trivial examples of corre- 
lation coefficients defined as in equation (29) and the characteristic length L will be given 
in another communication. 
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A DISTRIBUTION-FREE k-SAMPLE TEST AGAINST 
ORDERED ALTERNATIVES 


By A. R. JONCKHEERE 
Department of Psychology, University College, London 


1. IntTRODUCTION 


An experimenter is frequently able to state, on the basis of experience or theory, the 
expected rank order of magnitude of the effects of different experimental treatments. 
Suppose, for example, that an experiment is performed to determine the effect of different 
degrees of stress upon the performance of some task of manual dexterity. The data for 
analysis might consist of the number of errors made by four different groups of subjects, 
each working under one of the conditions: high, medium, low, or minimal stress. The experi- 
menter hopes to refute the hypothesis that the four samples of data could be considered as 
randomly drawn from the same population. However, experience or theory lead him to 
expect that the number of errors made increases with the degree of stress (his alternative 
hypothesis). Hence he desires a test of significance especially sensitive to those differences 
which, while tending to reject the null hypothesis that the samples are all drawn from 
the same population, at the same time lends support to the specified alternative in 
question. 

The customary one-way analysis of variance does not satisfy this demand: the F-ratio 
is independent of the order in which the group means occur. Furthermore, attempts to 
combine the probabilities yielded by an F-ratio and some coefficient of rank correlation 
between the expected and the obtained order of the group means are faced with a number of 
difficulties. In the first place, it is necessary to combine a continuous and a discrete pro- 
bability distribution, a procedure requiring special consideration and some arduous com- 
putation for each case as it arises (see, for example, F. N. David (1947)). In the second 
place, a complication arises owing to the fact that, because of the essential discontinuity 
of the measuring scales we have to employ in practice, the F-ratio may have a determinable 
value even when several of the group means are identical. Should this occur, the appropriate 
sampling distribution of the rank correlation between the predicted and the obtained rank 
order of means becomes troublesome. A similar difficulty arises when attempts are made to 
combine the probabilities yielded by the x? criterion and the rank correlation of the ratios 
involved in a 2 x k contingency table—as suggested by Hotelling & Pabst (1936). In this 
latter case, if the possibility of ties in the values of the ratios in a 2 x k contingency table is 
excluded, then the values of x? and the rank correlation are no longer independent for small 
samples. The difficulty in these situations is that while we can fairly easily determine the 
distribution of rank correlations for a fixed number and extent of ties, what seems to be 
required is the distribution when any manner of ties may occur, including the case where 
there are none. For unless there were other evidence at hand, it would be unreasonable to 
suppose that future repetitions of the same experiment would lead in every instance to 
exactly the same number and extent of ties. This complication, whatever may be the other 
merits of ‘conditional’ statistics, goes in fact farther than the present context, and is believed 
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to render difficult any rational account of the significance tests commonly applied when 
ties occur in rankings*. 

A further aspect of many experiments is that there appears to be no suitable metric by 
which the different treatments to be investigated may be characterized. Ali that an experi- 
menter is able to assert, particularly in psychology, is that they may be ranked in some 
order, such as increasing ‘stress’. When this is the case, it is no longer possible to use any 
form of regression analysis, since the independent variate is not suitably quantified. 

We have then the problem of assessing whether the results obtained from a one-way 
analysis of variance design are ‘significant’ and in conformity with a hypothesis as to the 
rank order for the size of the effects yielded by different treatments. When there are only 
two treatments under consideration, the problem is recognized and treated in terms of the 
usual distinction made between single- and double-tailed tests of significance. If the 
experimenter is indifferent as to which of the two treatments will give the larger results, 
a double-tailed test is employed. If, on the other hand, it is expected that a particular 
treatment will be more effective than the other, an appropriate single-tailed test is used. 
The proposed test attempts to extend this distinction to situations in which there are more 
than two groups by specifying one out of the many alternatives to the null hypothesis. 

Such a test has obvious applications in the study of time series. Suppose, for instance, that 
on each of n successive occasions, any one of k events may occur. Then we can test the 
hypothesis that the £ events occur randomly in the series of n occasions, against the alter- 
native that they tend to occur in a particular ordered time sequence. For example, the 
births of siblings might be of three types: normal, abnormal in respect of some characteristic, 
and stillbirths. The effect of birthrank on the type of birth could be tested here when, for 
instance, the alternative to randomness is that the earlier births tend to be normal, the 
later births abnormal, with finally the appearance of stillbirths. The application here is 
an extension of the method employed by Haldane & Smith (1948) for a similar situation, 
but where there are only two alternatives for characterizing the successive births. 


2. THE TEST PROPOSED 


Let (X41, X49, --+s Xim,)s «++ (Kaas «++» Xiags +++» Kim); «+> (Xia «++» Xiem,) be & samples of size 
My, Mp, ...,M;,...,M,, randomly drawn from populations with continuous cumulative dis- 
tributions F(X), F(X), ..., F(X), ...,#,(X) respectively, and arranged such that the first 


suffix of the X’s is in the order implied by the alternative hypothesis 
F(X) < F(X) <...<F(X)<...<F(X) for all X. 


Generally, if X,,, is the ath value in the ith sample drawn from a population with c.d.f. 
F(X), we wish to test the hypothesis that F(X) = F(X), (i,j = 1,...,4; +7), against the 
alternative that F;(X)< F(X) (i<j) for all X. 


ao prey tgs aiete 
Pia ja; = 0 if Xia> Xjap 
where t=1,...,.k-1; g=lt+t; a,=1,....m; a; =1,...,m;.T 


* Though the test to be presented involves the distribution of a rank correlation with ties, the 
above-mentioned difficulties do not apply. The number and extent of ties is controlled by the number 
and sizes of samples, characteristics which are fairly readily reproducible in a sequence of experiments. 

t See note added on p. 143 below. 
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Let 
Pig = > oF Piajap 
a= y= 1 
k-1 k ~1 k 
and finally, let S=2>y > py- zr x m,m,. (1) 
i=1 j=1+i i=1 j=1+i 


It is the statistic S that we propose to use for testing the null hypothesis against the 
alternative of ordered cumulative distribution functions.* 

A short numerical example may make the method of computation of S clearer while at 
the same time showing the use of the test. Suppose that the following values are obtained in 
an experiment involving four independent groups: 


I II III IV 
19 21 40 49 
20 61 99 110 
60 80 100 151 
130 129 149 160 
Mean 57-25 72°75 97-00 11750 


The experimenter wishes to test the hypothesis that the four samples have come from the 
same population against the alternative that the populations are such that the values from 
the samples I, II, III, IV are in an expected order of increasing value. It will be noted that 
the arithmetic means are in the predicted order. For the computation of S we have 
Mm =M,=m,=m=4 k=4, py=l1l, pyg=12, py = 13, 
Pog = 11, Poy = 12, Pgy = 12. 
Hence, using (1), S = 2x 71—96 = 46. 

It will be seen that the computation is quite simple. The samples are ranged in the order 
implied by the alternative hypothesis under consideration. Then, for each sample in turn, 
we determine for each value the number of items which are larger in all the succeeding 
samples. This gives sums of the values of Pig: Employing formula (1), we can then readily 
obtain the appropriate value of S. 

The value of 8 yielded by the above computation is the same as that given by the ordinary 
procedure for the calculation of Kendall’s S between two rankings, when one of them 
contains ties (Kendall, 1948). In fact, dividing the value of S by the maximum value it 


can have (when ties are taken into account) gives the usual 7 coefficient of rank correlation. 
Here 


S 
ge) = ee , 
x m,m,; 
i=1 j=1+i 
and with the example given, T = $$ = 0-48. In the present test the closer 7 is to unity, the 


more the sample data are in seufontiiies with the ordered hypothesis alternative to that of 
randomness. 

Employing the probability tables given later, we find Pr [S > 46] = 0-0168, and thus the 
experimenter would with some confidence reject the null hypothesis, and accept an alter- 
native that the sample came from populations which were stochastically ordered in the 
series I, II, III, IV. 

* If F(X) <F;(X) for all X, then it is obvious that if M,; and M, are the medians of the distributions, 
M,<M,;,. On the other hand, the shapes of the two distributions may be identical. It is believed that 


the present test is more sensitive to differences of location between the population distributions than 
to differences of scale or skewness. 
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It is interesting to observe that if an analysis of variance is performed on the data of this 
example we obtain, as a between-to-within-groups ratio, F; ,. = 1-216 and 


Hence the experimenter would have accepted with some confidence the null hypothesis 
that the four samples came from populations having the same means. The difference in the 
significance levels yielded by the two tests is, of course, explained by the greater variety of 
alternatives to the null hypothesis allowed for by analysis of variance techniques. 


3. PREVIOUS WORK 


The test just described is a simple extension to several groups of a procedure which has 
been frequently recommended for the problem of deciding whether two samples have come 
from the same population. Wilcoxon (1945) originally gave a form of this test, where the 
statistic to be computed was the sum of ranks of the items in one of the two samples, and 
supplied a few probability levels. When more than two samples are considered there appears 
to be no simple method for computing the value of S in terms of the sums of the ranks of 
the items in each sample. Festinger (1946) provided tables for samples of different size, but 
the asymptotic distribution was investigated only later by Mann & Whitney (1947), who 
besides showing that the normal distribution is the limiting form when both sample sizes 
increase without limit, also showed that the test was consistent with respect to the one- 
sided alternative F(X) < F,(X). By investigating the differential of the power function of 
the test, van der Vaart (1950) was able to show that the test was unbiased against a one-sided 
alternative. The power of the test has been recently discussed by Sundrum (1953). In a 
different context Haldane & Smith (1948) employed the same test for the detection of birth- 
order effects, giving the general expression for the cumulants of the statistic. 

An attempt to extend the test to more than two groups has been made by Whitney (1951), 
who considered the distribution of a statistic related to S for the case of three groups. Un- 
fortunately, by the use of his test, there is a large amount of information lost concerning the 
interrelations of the ranks of the items between the three groups, which is relevant to 
accepting the hypothesis alternative to randomness that the three populations are stochastic- 
ally ordered in increasing value. 

There have of course been many attempts to devise tests based on ranks which are in some 
sense analogous to the usual one-way analysis of variance procedures. Kruskal & Wallis 
(1952) give an account of these methods, but in so far as they do not specifically take as 
alternative to the null hypothesis, a hypothesis that the populations from which the samples 
are drawn are stochastically ordered, they differ from the test proposed here, and their limit 
distributions tend to be of Type III form. 

K. Iyer (1952), in the course of two articles, briefly developed a k-sample test employing 
essentially the same statistic as S, but did not discuss the situation for which the test seems 
to be most suitable. As recommended by Iyer, the test does not appear to be very sensitive 
against the general alternatives he apparently had in mind. 


4, THE SAMPLING DISTRIBUTION OF S 
In view of the identity of the statistic employed in this test and Kendall’s S coefficient of 


correlation rank when one of the ranks contains ties, the derivation of the appropriate 
sampling distribution will be carried out in terms of ranks. 
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One n ranked items such that there are k sets of ties, of extent m, (r = 1, 2,...,k) 

and m, = n. Then according to the procedure employed in the previous section for com- 
r=1 

puting the value of S between these tied and n untied ranks, a particular set of ties of extent 
m, will make no contribution to the total value of S, since no internal comparisons are made. 
If, however, they were untied, their contribution to the total would be equivalent to the 
value of S yielded by an ordinary untied ranking of m, items. 

Let f(n) stand for the probability-generating function of the values of S when both 
rankings consist of n untied items. Then, from the above reasoning, it follows that 


k 
f(n) — TI fem.) fm, Me, a My); 


where f(m,, mz, ...,m,,) stands for the probability-generating function of values of S when 
one of the rankings consists of n untied items and the other contains ties as in the general 
case mentioned above. 


Hence we have 


n 
Sf (my, mg, ...,M,) = oy. . ae ‘ (2) 
II f(m,) 
r=1 
Now Kendall has shown (1948) that in a universe of equiprobable permutations of ranks, 
f(n) = (n!) —i Il (X—s-» + xX-s-3) a * Xe-3) re X¢-), 
s=1 
Making the substitution in (2) we have 


(n n!)- —1 Il (xX-e-» + X-8-3 4 4 XC-3) 4 X¢-») 
Sf (my, ...,m,) = k | s=1 





(3) 
(m,!)— i (X —s-1) 4 X8-3) 4 4 X38) + xe-n)| 


s=1 
Putting X = e“ in this last expression, and employing substitutions made by Silverstone 
(1950) and Moran (1950) in a similar type of derivation, gives the characteristic function 


(n!)-1 (sin t)-™ J] sin (st) 
I {(m,1)- (sin t)-™ [[ sin st} 


r=1 s=1 





and, making use of the relation 








; © (¢22B Q2a-1 
log sinz —logx = ~ 2 ee 
we get as the cumulant-generating function 
- 5 Be ) k (2 B ge-1 ( ™, ) 
=. __) et a t2= mM, — 2a i 
K(f) = log o) a= 2 a(2a)! (n Pe 2} 2 a(2a)! 4 a? 
oy: 


where B, is the ath Bernoulli number (B, = 3, B, = ...). Now «,, the sth cumulant of 
the distribution of S, is given by the coefficient of (al) a (ity? in this series. Hence all the 
odd cumulants and moments are zero, while 


Keely, omy) = (— 222 |S pak Ee , (4) 


\p=1 r=l1p=1 
where Kk, (m,, mM, ...,m,) stands for the general even-order cumulant of the distribution of 
S when one of the arrays of ranks contains sets of ties of any possible extent and number. 
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5. THE CUMULANTS OF S 
The first four cumulants may be obtained by expanding (4) and become 


K,(m, ...,mM,) = 0 
1 eh 
K,(M, ...,M,) = Is {nt(2n + 3)- 2 mr(2a, + 3) , 
r= 
K(™,, ...,m,) = 0, 
1 ' k 
K,(M,, ...,M,) = — 395 {n®(6n? + 15n+ 10) —  amp( my + 15m, + 10)} ; 
and hence y, = 0, 


k 
36 {n?(6n? + 15n + 10)— S m3(6m?2 + 15m, + 10) 





=1 
bs te ~ 95 = 2 > 
{nt(2n+ 3)— 5 m2(2m,+ 3)! 
r=] 
ke 
where ym, = Nn. 


r=1 


If k is a factor of n, so that km = n, then the groups are of equal size, m, and we have 


K,(m) = 0, 
K,(m) = pe{k(k — 1) m*(2m(k + 1) +3)}, 
k,(m) = 0, 


Kq(m) = — she k(k— 1) m3 {k3(6m?) + k2(6m? + 15m) + k(6m? + 15m + 10) + 6m? + 15m + 10}, 
and hence 
Y= 9, 


_ _ 36 {k8(6m?) + k* 6m* + 15m) + k(6m?* + 15m + 10) + 6m? + 15m + 10} 
fs ofa (k—1) km{2m(k+ 1) +3}? 





If x,,(n) stands for 2cth cumulant of S for the usual untied rankings of n items, we may 


express (4) as k 

Koq (My, Mg, ...,Myz) = Kq(n) — 2 Kealm,), 
T= 

k 


where x Mm, = n, a result obtained by Iyer (1952) employing a more cumbrous method. 
If in (4 ee. put m, = m, =... = m, = 1, then k = n, and we have 


B, | n 





Ke,(n) = (—1)** — x p*—n), 
a p=1 

an expression obtained previously by Silverstone (1950) and Moran (1950). When m, = 
and m, = k—h, mz = m, = ... = m, = 0, we get the result obtained by Haldane & Smith 
(1948) for the even-order cumulants of a distribution studied in connexion with birth-order 
effects, in which their symbol A is replacable by 4{h(k + 1) —S} when 2h >k. 

When m, = m and m, = n, mz = m, = ... = m, = 0, we get, after suitable reduction, the 
expressions given by Mann & Whitney (1947) for the first four moments of the distribution 
of U, a statistic studied in connexion with a two-sample test, and which is related to the 
coefficient S by S = m,m,—2U. 
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If « is put equal to 1, we get the expression for the variance of the distribution of S in the 


most general case when one of the rankings contains ties of any extent and number, as given 
by Kendall (1948). 


6. THE EXTREME VALUES OF THE CUMULANTS 
It is useful both for practical and theoretical reasons to know what are the maximum and 
minimum values which the cumulants may take for various partitions of the total sample 
of n items among k groups. Formally we require the maxima and minima of 


B, fed | n k ™, 
ae. ee Le e. F pe ’ 
a p=1 r=l1p=1 
k 
where the m,’s may vary subject to the side conditions that } m, = n, and that m, may 
r=1 


only take integer values greater than zero. Further, since the values of a and n are taken 
as constant, we may simply consider the extreme values of 


k ™m, k 
> Dd p*, where >} m,=n. 
r=lp=1 r=1 
Consider now the partition of n such that 
Mm, =™M,=...= Mm ,=1 and m=n-—k+1. (5) 
k ™, n—k+1 
Then Yd p*= LY p%+(k-1l=G,(m,), say. 
r=1 p=1 p=1 


Any other possible partition may be obtained from (5) by putting m, = 1+d,, m, = 1+d,, 


k-1 
veey M_y = 14+d,_,, and m, = n—k+1—d,where ¥ d, = dand the d’s are so arranged that 
r=1 
0<d,<d,<...<d,_,<n—k—-d. 
With this latter partition we have 


k ™ n—k+1-—d k—-11+d, 
SYp*™= LY p*+ >. Y p*=G,(m,), say. 
r=lp=1 p=1 r=1 p=1 
n—k+1 n—k+1—d k-1i+d, 
hoe * m2 * = 5" 22 2 
Then G,(m,)—G,(m,)= LY p*+(k—-1) x p*e- dt = ™ 
p=1 p=1 r=1 p=1 
n—k+1 k-—11+d, 
= ¥ 92% bs . p* 4 
p=n—k+1-—d+1 r=1 p=2 
k-1 d, p r—1 2a | 
=> > (n—e+1-a+ 5 ds) lea i | 


t=1 


after some manipulation. 


But in view of the inequalities given above, the expression within the brackets will always 
k ™, n—k+1 
be positive or zero, and hence the maximum value of } ¥ p**isgivenby > p**+(k—1). 
r=ip=1 p=1 
By a similar form of argument, it may be shown that ifm = Ak+ B, (0<B<k), then the 
partition of the groups given by 


M, = M,=...=™M_p=A, M_piy=...=m =A, (6) 
k ™, A A+1 
yields a minimum value of © > p** equal to (k—B) > p*+B > p™. 
r=l1p=1 p=1 p=1 
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Hence the maximum value of k,, is given by the partition (6) and is equal to 
PB Qea-1( 2 A A+1 
max ky, = (— === — | F p— (kB) E p*—- BE p*l, 
p=1 p=1 p=1 


a 
while the minimum value of x,, is te a the seems (5) and is equal to 


—k+1 
Ep" p—(b-1)]. 


In particular, the variance of the distribution of S is a maximum when the ties are as 
nearly equal in size as possible. 


7. THE LARGE SAMPLE DISTRIBUTION 
The standardized ath cumulant of the distribution of S is 


(-)1B, al 5 Eo 5 F oml 











Y, = Kea. = r=1 p=1 ee 
” aB; | ot- & ¥ al 
p=1 r=1p=1 
n ag { 1 1 e (—)*1B,(2a)! 1 
Now 2 = net"\@at)* 1) 2n +2 (2%)! {2(@ —%) + 1}! =ai\ ? 


Hence, using this expression, we have 





(— 2B 20-4{ So ultat 1) +p ul2a) +E oP ea ee 





(2a + 1) 2n im1 (20)! {2(a—1) + 1}! nt 
Ya rm l a ’ (7) 
Bi an2-? {4u(a) + wu} 
k m, x 
where u(x) = 1-z (“*) 3 


Taking k, the number of groups, as finite, let the size of the groups increase with n so that 
lim —t = a;. Further, let a; = A; fori = 1,...,¢anda,; = 0, fori =¢+1,...,k. Now 


n>o 
t (m,; k (m; 
i=1 \% i=t+1 \% 


t 
and proceeding to the limit we have >) A; = 1. Hence 


i=1 


k ™m 2a+1 t 2a+1 
lim 5 (“) z% aye 5 A] wk. 


n>or=1 i=1 t=1 


k 2a+1 
If t>1, then lim > (=) <1 and lim u(2«a+1)>0, («>0). Hence, from (7), it can 


n>or=] n> oo 


be seen that lim y, = 0. Hence if more than one group is allowed to increase without limit as 


n-—>oo, then the limit distribution of S will be normal. If, on the other hand, ¢ = 1, then 


k +1 k 
lim > (=) = 1. In this case the groups will be such that m, = n—d and 7 m, = d, 


n—>or=1 \% 


where d is constant with respect to n. 
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Further, we have now 


wai) =1- ¥ (™)" =1fea+0(4) 2 5 mel, 


r=1 n gt r=2 





hence lim nu(x) = ad, and, taking the limit of (7), we obtain 
n>o 


3 (—)*7 B, 2-1 
lim y, = —— 
noo * Bead 
Hence ¢f only one group is allowed to increase without limit as n 00, then the limit dis- 
tribution of S will be non-normal and platykurtic. However, with a large number of groups, 
or with a large number of items in the groups, the value of d should be large enough to make 
the distribution normal for most practical purposes. We have, in particular, 





lim 7; 1-2 1-2 
2 = E =-—-— ors ; 
acaien Em, m(k—1) 
i=2 


where m stands for the mean size of the k—1 constant groups. Jn the extreme case where 


m = 1, we have lim y, = —1-2/(k—1). 


It follows that for large samples we may in most cases assume 
S 


Ji {nt(2n +3)- m2(2m, + 2) 
18 r=1 
to have a standard normal distribution. Since the interval between all possible adjacent 
values of S§ is always two, an improvement in the approximation to the true distribution 
will be obtained if unity is subtracted from an obtained value of S, prior to its division by 
the standard deviation. 

A somewhat closer approximation to the true distribution of S may be obtained by fitting 
a Pearson Type II curve and transforming to a Type VII or ¢-distribution. This is suggested 
by the fact that the range of S is finite, the distribution is symmetrical, and the value of 
£, less than 3. 

Representing the distribution by a Type II curve we have 








1 S2\™ 
f= GBm+1,}) (1-3) FB, (“FSR EM 

9—5f, — 22hr 
where m = ; ey Pe + FE 

2B, —6 os 

S? 2(m+1) 
es i 

If now we put t = @ Sa) (—o<t< +0), 

1 dt 
we have a 





i: B(m + 1, }) /(2m + 2) (1 + £2/(2m + 2))Kem+9)? 


which is the ¢-distribution with v degrees of freedom, where v = 2(m+ 1). 
Hence the distribution of S will be approximated by 


vp 
28 [om 
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distributed as ‘Students’ ¢ with v degrees of freedom, where 


v= —3(2+Y2)/Vo- 


Considering the case where, as n>0o, we have y,—>-—0; then v-oo and t¢,>S/,/p, 
giving the normal approximation obtained above. A continuity correction can again be 
applied here by deducting unity from the value of S prior to computing t,. 


8. THE EXACT DISTRIBUTION OF S FOR SMALL SAMPLES 


With small samples we may always obtain the exact sampling distribution of S by use of 
the general probability-generating function (2). The task of tabling all the possible com- 
binations of number of samples and sample sizes is, however, formidable, even for small 
values of n. We give therefore tables only for situations where the sizes of the samples are 
all equal.* In computations for this purpose a number of recursive formulae can be em- 
ployed for reducing the labour involved in determining the coefficient of X. 

We have 


1— Xs 
[(km — k)!] am vs iete—ay(am—1) 9=kn—1) +1 
[(km)!] GaP. © 





fim) = f,(m—1) 


where f,(m) =f(m, ...,m) (m repeated k times). 

Now /f,(1) is the distribution of the values of S when each sample only contains one 
number and has already been tabled by Kendall (1948). Hence by the formula given above, 
tables of increasing equal sample sizes for a fixed k may be fairly easily computed using the 
method of detached coefficients. Table 3 gives the probability of obtaining values of 8 
greater than or equal to a fixed value for all cases of equal-sized samples where the normal 
distribution does not appear to be adequate. 

When the samples are of unequal size and small it is necessary to compute the probability 
distribution of S from the general generating function (2). A more convenient form is as 
follows: . 

(nt) x-Hr- Em) IT] (1-X*) 





film, Mo, coos Mp) = ; ac s=1 
Il {(m,1)- ime -x)| 
r=1 s=1 


Expanding this expression by the methods of detached coefficients we obtain the exact 
probability distribution of S for any set of m,’s. 

It is interesting to observe that as the sample sizes become more and more heterogeneous, 
so the distribution of S becomes more and more platykurtic. Table 1 shows the values of 
Y2 corresponding to the partitions of n = 10 into five samples. The tendency shown by these 
figures would seem to be in general accord with the result obtained above; that the limiting 
distribution is non-normal when at most one of the groups is allowed to increase without 
limit. 

The accuracy of the normal and t-distribution approximations is shown in Table 2. This 
table gives the exact probability distribution of S for three samples of sizes 10, 1 and 1, 
together with the normal and ¢-distribution approximations. The value of y, is in this case 
— 0-6137; the variance is 87-6667, and the appropriate equivalent degrees of freedom for 


* Sillito (1947) gives tables for 3<n<10 and 1<m,<3. 


ee eee 
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the é-distribution is 6-7765. It will be seen that even in this rather extreme case the normal 
distribution is sufficiently accurate for most purposes. The ¢-distribution gives in general 
slightly more accurate values for the tail of the distribution. 

It will be shown in a future publication that the test is both consistent and unbiased as 
the sizes of the samples increase. 


My thanks are due to Mr J. W. Whitfield and Mr A. Summerfield for help in the presenta- 
tion of this paper. 
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Notes added in proof. (1) In Proc. K. Akad. Wet. Amst. A, 56, 433 (1953) Terpotra 
considers the asymptotic normality of S when ties are present in one ranking, under 
certain conditions which differ from mine, and gives some exact distributions. 

(2) The possibility of ties. In presenting the test based on S the possibility of ties 
occurring among the sample values has been excluded. This follows from the assumption 
that the cumulative distribution functions are continuous and hence that the probability 
of obtaining a tie is zero. If, in practice, ties should occur within a particular sample, 
then the value of S is not affected; it is only when they occur between the values of two 
different samples that a difficulty arises. In this case the most stringent action might be 
to untie the items tied between samples in such a manner that their values become the 
most unfavourable possible for the alternative hypothesis under consideration. Should 
the value of S still prove to be significant with this treatment, the null hypothesis may 
be rejected with some confidence in favour of the alternative. This treatment is of course 
only feasible if the number of ties between items from different samples is small; should 
this not be the case it would tend to indicate that the populations from which the 
samples have been drawn are not continuous and hence invalidate the use, not only of 
this test but of most others. An adequate treatment of this situation will only be 
achieved when more is known concerning the small sample distribution of values of S 
computed from rankings, doth of which contain ties of any extent and number. 
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Table 1. Values of y, = K,/x3 corresponding to the partitions of ten items into five samples 






















































































™m Ms | Ms ™ | Ms Y2 
6 gon, dibd git 1 | 1 — 0-329 
5 2 1 1 1 — 0-282 
4 3 1 1 1 — 0-261 
4 2 2 1 | 1 — 0-254 
3 3 2 1 1 — 0-246 I 
3 2 2 2 1 — 0-240 
2 2 2 2 | 2 — 0-234 
Table 2. Approximations to the true distribution given by the normal and the 
t-distribution, for case m, = 10, m, = m, = 1,n = 12 
. . | ° , | 
, Exact Normal approximation t approximation 
probability Probability | Error | Probability Error 
1 |  0-5000 05000 | +0-0000 | 05000 | 0-0000 
3 0-4167 0-4154 +0-0013 0-4237 — 0-0070 
5 0-3409 0-3346 | + 0-0063 0-3494 — 0-0085 
7 0-2727 0-2608 +0-0119 0-2795 — 0-0068 
9 0-2121 | 0-1964 + 0-0157 | 0-2156 — 0-0035 
ll 0-1591 0-1428 +0-0163 0-1590 +0-0001 
13 0-1136 | 0-1000 +0-0136 0-1110 + 0-0026 
15 0-0758 0-0674 + 0-0084 0-0720 + 0-0038 
17 0-0455 0-0437 +0-0018 0-0425 + 0-0030 
19 0-0227 0-0273 — 0-0046 | 0-0219 +0-0008 
21 0-0076 0-0163 — 0-0087 | 0-0091 —0-0015 
Table 3. Table giving Pr {S > 8,} for k samples each of size m 
pout k=3 | k=3 
Sy } So ] | m==5 
m= 2 m=4 | m= 3 | m = § So | (continued) 
0 | 0-5778 | 0-5284 ] 0-5000 0-5000 | 51 0-07303 
2 0-4222 0-4716 3 0-4155 | 0-4589 53 0-07204 
4 0-2889 | 0-4156 5 0-3339 | 0-4182 55 =| 0-07134 
6 0-1667 0-3609 7 0-2595 0-3783 ot. | 0-0°851 
8 0-0889 |  0-3090 9 0-1940 | 0-3396 | 59 | 0-0°526 
10 0-0333 | 0-2602 11 0-1387 | 0-3025 | 61 | 0-0°313 
12 0-0111 | 00-2157 13 0-0946 | 0-2672 63 0-07180 
14 — | 01756 15 | 0-0613 | 0-2340 | 65 0-04978 
16 — 0-1404 17 0-0369 0-2032 67 | 0-04502 
18 — | 0-1099 19 0-0208 | 0-1748 69 0-04238 
20 _ 0-0844 21 | 00-0107 00-1489 71 |  0-04106 
22 — 0-0632 23 | 0-07476 0-1256 73 0-05396 
24 — 0-0463 25 0-07179 0-1049 75 | 0-05132 
26 — 0-0330 27 0-0°595 0-0867 
28 —- 0-0229 29 -- 0-0708 
| 
30 — 0-0153 31 — 0:0572 =| 
32 — 0-07993 33 _ 0-0456 | 
34 —_— 0-0°615 35 -- 0-0359 
36 — 0-07367 37 — 0-0279 
38 — | 0-07205 39 — | 00-0214 
40 — } 0-07110 41 | —_ | 0-0161 i 
42 — 0-0°519 43 . | — | 0-0120 
44 — | 0-09231 45 | — |  0-09873 
46 — 0-04866 47 — | 0-07626 
48 — 0-04289 49 od |  0-07440 
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Table 3 (cont.) 
So So 
m= 2 m=3 m=4 m = 2 m=3 m=2 

0 0-5492 0-5276 0-5183 0-5353 0-5198 0-5271 0 

2 0-4508 0-4724 | 0-4817 0-4647 0-4802 0-4729 2 

4 0-3563 0-4177 | 0-4454 0-3951 0-4408 0-4193 4 

6 0-2683 0-3645 0-4094 0-3285 0-4019 0-3670 6 

8 0-1929 0-3136 0-3742 0-2667 0-3640 0-3170 8 
10 0-1302 0-2659 0-3400 0-2110 0-3273 0-2699 10 
12 0-0829 0-2220 0-3069 01625 0-2921 0-2265 12 
14 0-0484 0-1823 0-2754 0-1213 0-2588 0-1871 14 
16 0-0262 0-1472 0-24.54 0-0878 0-2274 0-1521 16 
18 0-0123 0-1166 | 0-2172 0-0613 0-1982 0-1215 18 
20 0-02516 00907 | 0-1910 0-0412 0-1713 0-0953 20 
22 0-02159 0-0691 | 0-1666 0-0265 0-1468 0-0734 22 
24 0-03397 0-0515 0-1443 0-0162 01247 0-0553 24 
26 sa 0-0374 0-1241 0-02939 0-1049 0-0408 26 
28 as. 0-0266 0-1058 0-02511 0-0874 0-0294 28 
30 ron 0-0183 0-0895 0-02257 0:0721 0-0207 30 
32 a 0-0123 0-0751 0-02118 -0588 0-0142 32 
34 pene 0-02797 | 0-0624 0-0°476 0-0475 0-02943 34 
36 x 0-02498 0-0514 0-0°168 0-0379 002608 36 
38 ~ 0-02299 0-0420 0-04441 0-0299 0-02379 38 
40 ie 0-02171 0-0339 0-05882 0-0234 0-02227 40 
42 sai 0-09928 | 0-0272 ser 0-0180 0-02130 42 
44 sia 0-0°471 | 0-0215 a 0-0137 0-0°710 44 
46 . 0-03222 | 00168 = 0-0102 0-0°366 46 
48 oe 0-0'947 | 0-0130 ae 0-02755 0-0°177 48 
50 dass 0-04352 |  0-02998 _ 0-02549 0-04787 50 
52 aie 0-04108 | 0-0°754 md 0-02392 0-04319 52 
54 - 0-05271 | 00-0562 pa 0-02275 0-04114 54 
56 = mS | 002414 = 002190 0-05347 56 
58 a. 45 | 002230 cos 0-02128 0-0°802 58 
60 = re | 002214 Si 0-03848 0-0°134 60 
62 ae iol | 0-02150 Seem 0-03548 ore 62 
64 “he. 2: | 0-02104 —~ 0-0°345 64 
66 a ant | 008701 ae U-0°212 tu 66 
68 = ‘ee | 0-09465 on 0-0°126 vite 68 
70 ont i 0-0°301 be 0-04724 “¢ 70 
72 po ar 0-0°191 ‘am 0-04401 ok 72 
14 fii ee 0-09118 ak 0-04213 ue 74 
76 ih he 0-0*705 = 0-04108 _ 716 
78 nb ne 0-04409 ‘ 0-05514 os 78 
80 hie iol 0-04229 0-05230 i 80 
82 Ue iat 0-04122 di. 0-0°945 ik 82 
84 - —- 0-05626 Fn 0-0°351 na 84 
86 a -“ 0-05301 0-0°113 ad 86 
88 — ae. 0-0°136 ra 0-07297 = 88 
90 a ss 0-0°555 “ 0-08595 i; 90 
92 we 0-0°206 se a 92 
94 a - 0-07634 . - = 94 
96 — bbe: 0-07159 <i. wed on 96 
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TWO-STAGE PROCEDURES FOR ESTIMATING THE DIFFERENCE | 


BETWEEN MEANST 


By 8. G. GHURYE anp HERBERT ROBBINS 
University of North Carolina 


INTRODUCTION AND SUMMARY 


Given two populations P; (¢ = 1,2) with unknown means 0, and variances o?, we wish to 

estimate the difference 0,—0,. Let t,(n) be the mean (X; ,+X,.+...+X;,,,)/n of a sample 

from P;. Then t,(n,)—#,(m2) is an unbiased estimate of 6,—0,, with variance > (o3/n,). 
i 


Assuming the cost of sampling to be a known linear function of the number of observations, 
the cost of taking n, observations from P, and n, from P, is a,n,+a_n,+43. If there is a 
prescribed upper bound A, to the cost of sampling, n, and n, are subject to the restriction 


AN, +4.Ng< A = Ay—asy. (0-1) 
The quantity > (?/n,), equal to the variance of t,(n,) —t,(n_) for integer values of the n,, 
i 


is minimized for continuous n, > 0 subject to (0-1) by taking n; = n?, where 


ni = (Aja,)ato,/Yajo;; (0-2) 
j 


the minimum value being equal to 
VA) = & (o%/n?) = AD aho,)*. (0-3) 


When the ratio o,/7, on which the optimum values (0-2) depend is not known, we can 
use a two-stage procedure for estimating 0, — 0,, first taking a sample of m, + m, observations, 
m, from P,, and then using estimates of 7; obtained from this preliminary sample to dis- 
tribute the remaining observations between the P;. We shall investigate the performance 
of this estimation procedure. 

For previous work done on problems of this kind, reference may be made to Putter (1951) 
and the literature cited in that paper. Putter considers the problem of estimating the mean 
of a population composed of a known number of normally distributed strata whose relative 
proportions are known. See also Robbins (1952, p. 528). 

In § 1 we assume the P; to be normal and evaluate the variance of the two-stage estimate. 
In §2 we show that as m,, m, and A->oo in a certain way, the ratio of this variance to the 
minimum variance V°(A) tends to unity, and we also prove the asymptotic result for more 
general populations. 


+ Work supported by the U.S. Air Force under Contract AF 18(600)-83, monitored by the Office 
of Scientific Research. 
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1. NORMAL POPULATIONS 


When the P; are known to be normal, we choose positiveintegers m; such that a,m,+4,m,< A 
and take m,; observations from P;. Let 


83(m,) = (3 x XZ, —m,t(m,) | om 1) = estimated variance of P,, (1-1) 
u(m,,M_) = at s,(m,)/> aj s,(m,); (1-2) 

my, if Au(m,,m,)/a,<m,, 
ni =<{(A—a,m,)/a, if (A—a,m,)/a,< Au(m,,m,)/a,, (1-3) 


Au(m,,m,)/a, if m,<Au(m,,m,)/a,<(A—a,m,)/a,, 
nf = (A—a,n*¥)/d., (1-4) 
and nh, = [nF], (1-5) 


where [x] is the largest integer in x. Having computed %,; we take (7; — m,) more observations 
(X;;,j = m,+1,...,%,) from P,, and estimate 6,—0, by 


ty (7%) — te(%_) = pr Xy 5/7 - p> X4 j/ia. (1-6) 
Let V(A) = var {t,(fi) —to(fi,)}. (1-7) 
Now t,(%;) = m,t,(m,)/R; + (1, —m,) t,(%;, —m,)/7;, (1-8) 
where t(i,;—m,) = > X;,5l(%;—m,). 
j=mit1 


Since the %; depend only on the s,(m,) for fixed s;, the random variables t,(m,), t,(mz.), 
t,(i, —™,), tg(%,—m,) are mutually independent, the conditional distributions of t,(m,) 
and t,(%;—m,) being respectively W{0;,0?/m,} and W{0,, 07/(#;—m,)}. Hence for fixed s; 
the conditional distribution of t,(%;) — ta(fig) is W{0, — 9, 5 (o7/7,)}. Consequently, 

i 


Ei{t,(7i,) —ta(%p)} = 0,-O, and V(A) = E{> (o7/%,)}. (1-9) 


Let F(u) = Pr {u(m,, m2) < u}. (1-10) 
Then from (1-5), (1-9) and (1-10) we have 
V(A) = {o}my* + o3[(A —a,m,)/a.}} F(a, m,/A) 
+ {o3[(A —a,m,)/a,]-! + of mz *} {1 — F(1 —a,m,/A)} 
4. ite {o{[Au/a,] + o3[A(1 —u)/aq) 7} dF (u). (1-11) 
aym,/A 


In what follows we shall denote by V*(A) the expression obtained by dropping the square 
brackets in (1-11). 
Let p=0,/0,, ¢= ES b, = (A —a,m,)/(4,m,), } (1-12) 
b, = a,m,/(A—a,m,), 7; = (m;—1)/2 and gq =7,/7;. 
+ Putter (1951) uses s?(m,) (m,;—1)/(m,;—2) instead of s%(m,) because it minimizes an expression 
a}? which is the variance of the estimate obtained by ignoring the fact that the sample sizes prescribed 
by the two-stage procedure are truncated short of the extreme limits possible under (0-1). 


10-2 
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Then from (0-3) we have VA) = A—a,03(1+pc)?, 
and we can reduce the expression for V*(A) to 
A{V*(A)— V%A)} = a, 07(p — cb,)?bz*F (a,m,/A) 
+a, 03(p — cb)? b51{1 — F(1 —a,in,/A)} 


1—a,m,/A 
+| 7 a, o2{(1—w) u-? + p*e?(1 — u)-1 w— 2c} dF (u). (1-13) 


aym,/A 


Finally, making the substitution 


w = prcu?/{prc?u? + g(1 —u)?} = (my — 1) (8{/07)/E (m4 — 1) (85/97) = Xia! (Xm + Ximg—1)» 


(1-14) 
so that w has the density function 
f(w) = {1/B(r,, r2)}w 2 '(1-—w)?  (O<w<)), (1-15) 
we reduce (1-13) to 
V*(A)/V%A) = 1+ (1+ pc)-* {pcl, + (p —cb,)? bz1Z, + (p — cb)? bz1J5}, (1-16) 
Bs 1 
where i= } {(qw)* (1 —w)-* + (qw)-* (1 — w)t — 2} f(w) dw, | 
A 
(1-17) 


e 


I, = {"fe0) dw, I =| f(w)dw, 8B; = p*{{p* + c%gb3}. | 
0 Bs 


Hence, V*/V°® can be computed by means of tables of the incomplete B-function. We have 
done this for a, = a, = a, N = A/a = 30, 50, m, = m, = m = (0-2) N, (0-3) N, (0-4) N and 
various values of p. The results are given in Table 1. 

For a, = a,, the usual procedure for estimating 0, -- 0, consists in taking n, = n, = 4N 
and using the estimate t,(4N) —t.(4.V). The variance is 

V’ = 2(of+0%)/N. 

For comparing V* with V’, the values of V’/V°® are given in the last row of the table. 

The two-stage procedure seems to effect considerable improvement over the usual one- 


stage procedure for values of p away from 1; and the performance seems to be best for m/N 
in the neighbourhood of o,/(7, + @,) if 7, < 7». 


Table 1. Comparison of V* with V° and V’ for normal populations 





1-00 1-25 1-50 1-75 2-00 2-25 2-50 2-75 3-00 





V*/V° for 0-2 1-064 | 1-062 | 1-058 1-054 | 1-049 | 1-044 | 1-039 | 1-034 | 1-030 
N=30 0-3 1-034 | 1-032 1-028 | 1-023 | 1-018 | 1-016 | 1-016 | 1-018 | 1-022 
0-4 1-017 1-014 | 1-012 | 1-014 | 1-025 | 1-039 | 1-056 | 1-075 | 1-094 





0-2 1-032 1-031 1-031 1-029 | 1-027 1-025 | 1-022 1-019 | 1-017 
N=50 0-3 1-021 1-019 1-017 1-014 | 1-011 1-010 1-008 | 1-011 1-016 
0-4 1-013 1-009 | 1-007 1-010 | 1-021 1-036 | 1-055 | 1-074 | 1-094 





Lys 1-000 | 1-012 | 1-040 | 1-074 | 1-111 | 1-148 | 1-184 | 1-218 | 1-250 
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s. ASYMPTOTIC EFFICIENCY 


The idea of substituting s,(m,) for 7; in (0-2) is based on the belief that as m;—> 00, the ratio 
fi, |. will approach the optimum value n9/n9 and V/V° will approach unity. We prove that 
this is the case when the populations are normal, and then we shall prove a similar result 
which is true also for other populations. 


THEOREM 1. Let P; be normal, and consider V*(A)/V°(A) as given by (1-16). Let a,, a, 
and p remain fixed while m,, m, and A become infinite in such a way that 


0<h<m,/m,<h'<oo, where h,h’' are —s (2-1) 
and mj/A>0O (i= 1,2). 
Then V*(A)/V%A)>1. (2-2) 
Proof. In (1-16), 
L, < ¢B(r, + 4, 72— $)/Blry, 72) +9 *B(ry — 3, 72 + $)/Blry, 72) — 2 
= (T(r + 4) Mre— 4) + TV  — ) Pro + HP) P(ra)} — 2, 
which converges to zero, since 
x'’T(x—h)/T(x)>1 as 2-00. (2-3) 


I, < Pr {w < p?/(p? + c*qbj)} < Pr {w > c*qbi/p*} 
< E{w—} p?/(c?gqbj), 


and I, < Pr {w > p?/(p? + c2qb3)} < Pr {(1 — w)-1 > p?/(cgb3)} 
< E{(1 —w)*} c7gb3/p. 

Now E{w-?}} = P(ry— 1) P(r, + 172)/{T (ry) (71+ 72-1}, 

and E{(1—w)-3} = [P(rg—1) P(r, + 172)/{T (re) U(r + 72-1}, 


both of which remain bounded on account of (2-1) and (2-4). Since 6, > 0o and b, > 0, we have 
(p—cb,)*by1I,->0 and (p—cb,)*bz1J,>0. 
Hence (2-2) is proved. 
From the expression (1-11) for V(A), it follows that 
V(A)/V%A)—>1. 


Next, we remove the restriction that the P, be normal, assuming only that they have 
finite variances o? and that we know functions f; such that the statistics 


8,(n) = f,(X;1,.-..X 
satisfy conditions (I)—-(II‘) below. 


i,n? n) (2 = 1, 2) (2-4) 


Let F,(s; ») = Pr{s,(n)<s} and O,(e) = [o,—¢€,0;,+€]. 
We assume: 
(1) There exists an « > 1 such that for every fixed ¢> 0, 


n* Pr {s,(n) ¢C;(e)} is bounded for all n> 0, 7 = 1, 2; 


(II) There exists an e > 0 and < min (¢,, o,) such that nt | s'd F;(8;n) — | is bounded 
for alln>0 and k = 2, —2; Ci(e) 








150 Two-stage procedures for estimating the difference between means 


(III) Either ¢,(n) and s,(n) from the same sample are a pair of mutually independent 
random variables, or P; has a finite fourth moment. 

We shall follow the two-stage procedure given by (1-2)—(1-6) with s,(m,) as given by (2-4) 
instead of (1-1). Then we have 

THEOREM 2. Let a,, a, and p remain fixed while m,, m, and A become infinite in such 
a way that 


(2-1) holds, and A/m*++» is bounded. (2-5) 
Then E{t, (ii) — te(t%_)} > 0, — 82, (2-6) 
V(A)/V%A)—1. (2-7) 

Proof. We shall first prove the following statements: 
Assumption (II) also holds for k = 1 and —1; (2-8) 
Am{E(n;*) — (n?)-*} is bounded for k = 1, 2; (2-9) 
A®m? E (iz) > 0. (2:10) 


To prove (2-8) for k = —1, we note that 


: 
| sd (8; n) <| [ s*dR(s; n)\ = 071+ O(n}; 
Cie) J Cie) J 
and since 85} = of {1+ (s?0;?-1)}-*# > 07 {1 — (4) (saz? —-1)} 
for s,€C;(e), we have 
[etameimeor {iy | anieim)—() | stortdk(s; m 
Ci(e) C,(e) 


= 07 {1+O(n-")}, by (I) and (II). 


Cie) 


Hence (2-8) is proved for k = —1, and in the same way for k = 1. We need only prove (2-9) 
and (2-10) for i = 1, since the proof for i = 2 is similar. Moreover, from the definitions of 
ni, and n¥, it is evident that (2-9) and (2-10) hold if and only if they hold with %, replaced 
by n¥. 


Let m = (m,,m,) and set Um = 8q(M)/8,(m,), (2-11) 
and N,, if v,,>N, = (A—a,m,)/(ca,m,), 
v= s if v,,<6,, = @,M,/(cA —ca,me), (2-12) 
v, i é, <4, <Q, 
Then 6, = O(m,A)+0 and N,, = O(mz1A)>o, (2-13) 
and (n¥)— = a,(1+cv*)/A. (2-14) 


Consequently, to prove (2-9), we need only show that 
m,{H(v%)* — p*} is bounded for k = 1, 2. (2-15) 
Let us choose an €¢ to satisfy (II); we can, by (2-13), choose m,, m,, A large enough so that 
6, <(F_—€)/(o,+€) and N,,>(0,+€6)/(o,—€), 
and hence such that {s,{m,)eC,(e) (¢=1,2)}=36,, <0, < Np. 
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Under these circumstances, we have from (2-12) 


0< E(v%)* -| | 8; "sk dF,(8,; mg) dF,(s,; m,) < NE, p(m,, m,), (2-16) 
C,(e) J C,(e) 
where (my, my) = 1— TT Pr{si(m,)<O,(6)} = O(m-~). 
i=1,2 


By (II), the second term in the middle in (2-16) is p* + O(m-), and by (2°13) the last term 
is O(A*m-*m-*) = O(m-) for k = 1, 2. Thus, we have (2-15) and hence (2-9). We can prove 
(2-10) similarly. 


Now let T, = M,t,(m,)/Hi, — Mgty(mq)/Me, } (2-17) 
T', = (hy — my) t,(%, — m,)/% — (7g — Mg) tag — Me) /Rg, 
where (4; -—m,) t,(%,-—m,) = X;; 
j=m+1 
Then Eft, (1) — t.(%_)} = H{T}} + £,, H{T, | 7}. (2-18) 
Since %, depends only on {X;, ;, j= 1, 2, ...,m,} and t,(%;—m,) for fixed %,, only on 
{X; 5, f=m, +1, ..., %4}, 
we have Pr {t,(%;—m,) <x, +=1,2|t,=n} = 1. I] Pr{t,(n,;—m,) <;}. (2-59) 
Therefore EXT, | 3} = (iy — My) 04/7, — (Fig — Mg) O/%o, 
so that E,, E{T, | %y} = 0, —9,—m, 6, E(1/%,) + m9, H(1/%,) 
+>6,—0, by (2-5) and (2-9). 
Moreover, m, EX{t,(m,)/%;} < mf{EHtR(m,) E(1/77z)} 
= A-'m;0(1) {A2H(1/n?)}4 
+0 by (2-5) and (2-9). 
Therefore, HT',-> 0, and hence we have (2-6). 
Finally, AV(A) = Avar(T,+7,) = A varT,+ A var 7, + 2A cov (T;,7,). (2-20) 
A var T, = A var {m,t,(m,)/, — Mgt,(mz)/fa} 

< 2Am§ var {t,(m,)/7,} + 2Am§ var {t,(m)/7io} 

< 2> Ami var {[t,(m,) — ;]/%; + 9,/2,} 

< 4D Ami E{[t,(m,) — 6,77} +4> Am? 6? var (1/%;). (2-21) 


From (2-5) and (2:9) we know that Am? var (1/7%;)->0. As for the other term on the right- 
hand side in (2-21), if t,(m,) and s,(m,) are independent, we have 


Am? E{{t,(m,) — 9,]}7/73} = Am? E{[t,(m,) — 6,}*} £(1/#2) 
= Am, 07 E(1/R) 
+0 by (2:9). 
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If t,(m,) and s,(m,) are not independent, we still have 
Am; E{{t,(m,) — 6,)/nj} < Ami{E[t,(m,) — 0,}* B(1/n})} 
= {mj E[t,(m,) — 0, A*m{ E(1/ni)}* 
+0. by (II) and (2-10). 
Hence, A var (T',) > 0. 
We shall see below that A var (7) is bounded, so that 


A cov (7, T,) < {A var (T;) A var (T,)}# > 0. 
Therefore 


lim A V(A) = lim A var 7; = lim AE, var (T, | %,) + lim A var E(T, | 7) 
= lim AE, var {(f, —m,) t(%, — m4) /7i, 
— (fg — Mg) ty(7q—Mg)/7g} 
+lim A var {(%, — m,) 0/7 — (fig — Mg) A/a} 


= lim AE,,{> (i, — m,) of/fij} + lim A var {m, 0/72 —m,0,/7;} 


= lim > of AE(1/’,) — lim > Am,o7 E(1/#j) + lim A var {m,4,/f%2—m4,/7;}. 
: : (2-22) 
The last two terms are zero on account of (2-5) and (2-9); and from (2-9) we see that the first 
term on the right-hand side of (2-22) is the required limit in (2-7). 
If we use sample sizes n; = (A/a,) a}/> at, which by (0-2) is what we would be led to do if 
j 


we thought that 7, = 0, the variance of the estimate of 6,— 0, would be 
V’= V%{1+e(1—p)?/(1+pc)*}>V° for p>1. 


Hence, asymptotically, the two-stage procedure is more efficient than this one-stage 
procedure if p> 1. 


Examp.gs. (1) If the P, are normal, the conditions of Theorem 2 are satisfied for every 
a > 0, and hence in (2-5) A may increase as any power of m;. We have seen in Theorem 1 that 
it is actually not necessary to restrict A to be of the order of a power of m,. 

(2) Ifthe P; are Poisson, o? = 6; and s?(n) = t,(n). From the fact that nt,(n) has a Poisson 
distribution, it can be seen that the conditions of Theorem 2 are satisfied for every « > 0. 

(3) If the P, are binomial, with Pr{X; = 1} = 0; and Pr{X; = 0} = 1—@;, we have 
of = 0,(1—6,) and s?(n) = t,(n) {l1—t,(n)}. Using the fact that nt,(n) is a binomial variate, 
we can show that the conditions of the theorem are satisfied for every a > 0. 

(4) If we do not know the forms of P;, we would use the estimate s,(m,) of o; given by (1-1). 
If we know that P; has a sufficient number, say 8, of moments finite, we can show that 
Theorem 2 is true for the procedure given by (1-1)—(1-6). 
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A BIVARIATE GENERALIZATION OF STUDENT’S t-DISTRIBUTION, 
WITH TABLES FOR CERTAIN SPECIAL CASESt 


By CHARLES W. DUNNETT{ ann MILTON SOBEL 


Cornell University 


A multivariate generalization of Student’s t-distribution is considered. The bivariate case is treated in 
detail; exact and asymptotic expressions for the probability integral and an asymptotic expression 
for certain percentage points are obtained. The main results for the bivariate case are given as equa- 
tions (10), (11), (23) and (30) below; these equations are used to construct tables for certain special 


cases. 
1. INTRODUCTION 
Consider the joint distribution of p variates t; = z,/s (¢ = 1,2,...,p), the z, having a non- 
singular multivariate normal distribution with means 0, common unknown variance o? 


and known correlation matrix {p;;} and ns?/o? having a x2-distribution, independent of 
the z;, with n degrees of freedom. 


The joint density of 2, ...,z,, 8 is clearly 
__ 2A¥ _ (n/292)Kn+p) gn -... > Oss 2525+ ns? /or| dz, ... dz,.ds (1) 
(nz)* T'(}n) \ 2\j—1 "  * eer wees 


where A is the determinant of the positive definite matrix {a,,;} = {p,,}-1. Letting z, = t,8 
(¢ = 1, 2,...,p) and integrating out s, we obtain the expression 
AT (Hn +p), 1 “int 
Company [tatu 
for the joint density of t,,¢,,...,t,. As noo, (2) approaches the standardized p-variate 
normal density with correlation matrix {p,,;}. For p = 1, it reduces to Student’s t-distribu- 
tion with n degrees of freedom. Thus it may be considered as a multivariate generalization 
of Student’s t-distribution. 

The distribution defined by (2) arose in connexion with certain multiple decision problems 
concerned with the ranking of p + 1 populations according to their population means when 
the population distributions are assumed to be normal with a common unknown variance. 
These applications are discussed in a separate paper (Bechhofer, Dunnett & Sobel, 1954). 

The distribution (2) with p,; = }(i+j) also can be used in connexion with a multiple 
decision problem considered by Paulson (1952). The symbol A,, defined on the top of 
p. 244 of his paper is, in our terminology, the equi-coordinate percentage point (defined 
for p = 2 in §2 of this paper); for k=3 (p= 2) these percentage points are given as 
t,(P; 4) in Table 3. 

The major purpose of the present paper is to derive expressions for the probability 
integral and the equi-coordinate percentage points for the bivariate case of (2). Certain 
useful tables computed from these expressions are presented. The problem of computing 
equi-coordinate percentage points for the multivariate case of (2), using Table 1, will be 
discussed in a further Note which it is hoped to publish in the next issue of this Journal. 


dt, ... dt (2) 


v 


+ This research was sponsored by the United States Air Force, through the Office of Scientific 
Research of the Air Research and Development Command. 
t Now at Lederle Laboratories, Pearl River, New York. 
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2. EXACT EXPRESSION FOR THE BIVARIATE PROBABILITY INTEGRAL 


For the bivariate case (p = 2), it will be convenient to replace t,, ¢, and p,, by u, v and p, 
respectively. Then (2) reduces to 





1 — 2puv + “ —Hn+2) 
U,V; 1+ 3 
tal 0) = ae =pR + ao ®) 
We wish to evaluate the probability integral 
P,(h, k; p) =" — n(u, v; p)dudv. (4) 


The integral of (3) over any region of the type h,<u<hg, k,<v< keg, finite or infinite, can 
be expressed in terms of (4) and the probability integral of the univariate Student f-dis- 
tribution. 

As a first step in the evaluation of (4) we make the transformation 


u— pv 
J(1—p?*)’ (5) 
r sin 0 = v, 


rcos@ = 


the Jacobian of which is r (1 —?). We obtain 


P,(h, k; p) = 4(1+sgnh) (1+sgnk) 


e (" | bn(r)drdo — | 3 - $,(r) drdO, (6) 
J Cth, k; p) J kescé Clk, h; p) J hescd 
—#(n+2) vit 
wears bal?) = 5 ( +r)", O(h,k; p) = arctan =* a z, 


+1 if m20, 
and sgnm = i 
—-1 if m<0O. 


In (6) and throughout the remainder of the paper we assume, unless stated otherwise, that 
for any real A, B we have A. 
0 S arctan — < 27, 





B= 
and the angle is to be interpreted as lying in the interval (0, 477), (477,77), (77, $77), or (377, 277) 
according as the signs of (A, B) are(+,+),(+,—),(—,—), or(—, +). Forh =k = 0, the 
lower limits on @ are indeterminate, but if we set h/k = € the limiting value P,(0, 0; p), 
/ — 72 
given in (15) below, is independent of £. If p = + 1 and h = pk, we define are tan HVE in —F 
~P 
3 h J/(1—p?) 
to be 47 if k is positive and 37 if k is negative and similarly for arc tan 2" 
The function ¢,,(r) is immediately integrable with respect to r, and if we define 
tf? k? esc? @\ —” 
»(m,h,k) = (1+=S2°) dé, 7 
Cin( ME ~ sees m ™) 


then the result of the integration can be expressed as 


P(h, k; p) = £(1+sgnh) (1+sgn k)—Q,,,(n, h, k) —Q,,(n, k, h). (8) 


at 
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The recursion formula 
Qin(m, h, k) vn Qin-1(™, h, k) 
k 1 Tr —| 
ei V(mm) (1 + k2/m)je—D ao M {1 +sgn (h —pk) Lym,n, 1014, 4(m—1)}} (9) 
is useful in reducing (8). Here 
™ (h — pk)? 
s(n, hE) = Th ob + (1p) (m+ BF) 
_ [7% Tp +9) a 
and Levon, P> VD) = I, Tp) Tq) y?-*(1—y)* "dy 


is the incomplete beta function tabulated by Pearson (1931). If we make repeated applica- 
tions of (9) in (8), and compute Q,(n, h, k)+Qo(n, k, h) and Q,(n,h, k) + Q,(n, k, hh), we obtain 
for even ” 








v(1—p?) 

=p 

k *Tj-) 1 
nm), TG) (+2 ]n) 
h #®Tj-) 1 


+Fjam 2 PG) amma +880 (b—ph) Leon (bsJ-B (10) 





P,(h, k; p) = are tan 








+5 (1 +8gn (b— pk) Len,n,0 (45D 








and for odd n 
1 (h+k) (hk + pn) — (hk —n) J(h? — 2phk +k? +-n[1—p a4 
k; \ seal ee 
Fe ee [vn fe n) (hk + pn) —n(h +k) \(h? — 2phk + k2 + n[1 —p*)) 
gE =) Tj) 1 : 
Il+s (h—pk .) (4, 
*Tynn) ja Tj +4) ( 14k ny -[ + gn (% Pp ) Lin,n,k) (4 Dd] 
i die " 1) _TG) 1 
4j(nm) fH, TU +4) (+n 
The expression in the first term of (11) can be rationalized and simplified, but the exact 
rule of determining the quadrant then becomes involved. 
For computational purposes, it is not necessary to use Pearson’s tables in evaluating the 
incomplete beta functions. Using the reduction formula 








[1 +sgn ( (k — ph) Ltn, k, hy ( ))- (11) 





aaa oy AP + 9—D ayy 
lAp. t= (p= 1.9) igy 2 (12) 
which is easily derived, we obtain 
«\.8 = 4*(i!)? , 
(4, 3-4) = - Faretan, /(5*5) +5 Vea (l-z x)} 2 1, @i+ It x) (13) 
‘ _ Ja} (2%)! ‘ 
and L,(4, J) = ve a ane — x)". (14) 


It is of interest to note that the result for 4 = k = 0 in both (10) and (11) is 
(1 —o2 
P,(0,0; p) = arctan “=P, (15) 


which is independent of n and is therefore identical with the corresponding result for the 
bivariate normal integral. 
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3. ASYMPTOTIC EXPRESSION FOR THE BIVARIATE PROBABILITY INTEGRAL 


Since the number of terms in (10) and (11) above increases with n, the usefulness of these 
expressions is limited to small values of n. An asymptotic series in powers of n~ will now 
be derived, the first few terms of which will yield a good approximation to the probability 
integral even for moderately small values of n. The method of derivation is essentially the 
same as that used by Fisher (1925) to approximate the probability integral of the uni- 
variate Student ¢-distribution. 

Consider the probability density function g,(u,v; p) defined by (3), and the limiting 


normal form 
u* —2puv+v? 


1 
2a V(1—p*) P| a= p%) 





(u,v; p) = (16) 


obtained when n->0o. We will express the difference g,,(u,v; p) —g(u, Vv; p) as a power series 
in n-1 and then integrate this series term by term over the desired region of integration. 
The desired integral of (3) then is equal to the integral of (16), which has been tabulated by 
Pearson (1931), plus a series of correction terms. 

It is convenient to let 


_ v—2puv +o? (17) 


1—p? 


r2 





which is in accordance with the notation introduced in (5). A straightforward logarithmic 
expansion in powers of n~! gives 


n+ 2 1 ce af a :) | (" r\ | y c| 1 ™ ee 1 re 
— pe oO nant aa Se es Sie —_— i oe! ae 
2 ) g( ") 2 tent 8: (2 sate 3] n8 (J 73) at 








(18) 
Hence, by (3) and (16), we can write 
Gn(Us 0; P) _ {ft 2) 1 ea ape (5 ee as) 
g(u, Vv; p) te 4 ‘a Fea 6, at (5 at 4 a} mite| 
me 1+(4 a Fea alia ‘) 1 A figl® alt, 13r* 6 1 
= 4 n'\32. 12°” )n (ona 96° 24 ns 
rié gif 177 TTP" 8) - 
*\6i4a~ 1287 144 120 7” | nt 
=1+D(r), say. (19) 
Then the desired probability integral satisfies the relationship 
rk fh k fh _ 
| | [9n(u, v3; Pp) —g(u,v; p)]dudv = } | D(r) g(u,v; p)dudv. (20) 


Applying the transformation (5), and integrating with respect to r, the right-hand side 
of (20) can be written, after some algebraic simplification, as 


-{" F(k, keot 8) kese* Odd — |” F(h, h cot 0) h esc? 6d6, (21) 
Cth, k; p) 


Clk, h; p) 


——— 
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where 


: k I 
= —eHe*+k*)! p24 £2] 4 -—_ 396 4 (gf2— 
F(k, x) an tale +k Itz a Loe + (9k? — 16) a4 





+ (94 — 32k%) a® + (3k*— 16K4)] + 7 _ [x1 + (5k? — 16) 28 + (10k4 — 64K? + 48) 28 
+ (1OK® — 96k* + 144k) a + (5k8 — 64k5 + 144k4) 224 (k1°— 16k8 + 48k8)] 


1 
+ Fea ps [15a"4 + (L05K* — 480) 219+ (315k4— 2880k" + 4160) 21° 
+ (525k6 — 7200k* + 20800k? — 9216) a® + (525k8 — 9600kK* + 41600k4 — 368642) 2® 
+ (315k! — 7200K8 + 41600k* — 55296k*) 24 + (105k! — 2880k! 


+ 20800k8 — 36864k) 2? + (15k14— 480k!2 + 4160k!° — 9216K8)]+.. | : (22) 
If we make the transformation w = kcot@ in the first integral of (21), and w = heoté 


in the second integral, and integrate with respect to w in each of them, then our final result, 
after further algebraic simplification, becomes 


k fh 
[ J,(uU, v; p)dudv =" fw g(u,v; p)dudv+ F*(a, k) + F*(6,h), (23) 


where 
Py et b - 
* ae . \ J] (f2 6_ 7/4 52 
F*(a, k) ath) Sle) ia F +1)+ 7 _* (3k 7k4 — 5k? — 3) 
] 
-10 __ 8 AL6 = oo 
+ Toana * 11k§ + 14k* + 6k4 — 3k? — 15) 


1 
+ apg (15K ARES Me) 


an ) Hk) + +i 1 [9k* + k2(9a2 — 5) + (3a4 —a?—3)] 


+ ro5s —< [5k® + k8(10a? — 34) + k4(10a* — 46a? + 6) + k?(5a® — 29a* — a? — 3) 


+ (a®— 7a® — a* — 5a? — 15)] [105k}2 + k19(315a? — 1935) 


1 
* £6080n4 
+ k8(525a4 — 4575a? + 7075) + k®(525a% — 5925a4 + 11975a? — 939) 
+ k*(315a8 — 4365a° + 110454 — 71a? — 213) 
+ k?(105a!° — 1725a8 + 5275a* + 61la* + 305a? + 915) 





+ (15a!2 — 2850) + 1025a®+ 9a* + 63a + 31502 + 945)] + a ; (24) 
In this expression we have written 
1 
—tx? 
H(z) Tame? 
G | “1 _e-int 
(x) = —— e-#* du, 
=} _ Jen 
h—pk k—ph 
a=- and 6 = ———~. 
v(1—p?*) V(1—p?*) 


The numerical values of H and G are conveniently obtained from the tables of the prob- 
ability function given by the National Bureau of Standards (1953). 











158 A bivariate generalization of Student’s t-distribution 


4. ASYMPTOTIC EXPRESSION FOR THE EQUI-COORDINATE PERCENTAGE POINTS 


For any P (0< P< 1), we define a point » k) to be a 100P-percentage point if it satisfies 


the relationship pa jan onl v; p)dudv. (25) 
n(U, 


For fixed P, this defines a curve in the (h, k) plane. Of particular interest in the applications 
is the unique point on this curve where h = k; this may be called the equi-coordinate 
100P-percentage point and the common value of the co-ordinates denoted by ¢ = t,,(P; p). 

For fixed P, the value of t can be determined for any ” by trial and error using formula 
(10) or (11). However, this procedure becomes laborious with increasing n. In this section 
we shall derive an asymptotic expression in powers of n-! which expresses ¢ in terms of the 
corresponding quantity x for the bivariate normal distribution; the latter can be obtained 
by interpolation in the tables of K. Pearson (1931). This expression yields a good approxi- 
mation tot even for the moderately small values of n. The method of derivation is essentially 
the same as that used by Fisher (1941) in deriving an asymptotic expression for the per- 
centage points of the univariate Student t-distribution. 

By definition of ¢ and x, we have 


t t Pp zx 
P = | J,(u,v; p)dudv = | : } g(u, v; p)dudv. (26) 


Expanding the right-hand member of (26) in a Taylor series as a function of x around z = t, 
we obtain 


P= J Ad g(u,v; p)dudv 


(a—t) HG + (a—t)? A{-—t6+ G@}+4(x-1)8 A{(P— 1) @—t(c? + 2) G’} 
agp (é — 3t) G + [#(c4 + 3c? + 3) — (c? + 3)]G’}+..., (27) 
Pl cl 1=G Y axe POC) 
where c= /5=2, H = H(t), G = Get) and G = dp = CH (ct). 


On the other hand, using (23) with h = k = t, the middle member of (26) gives the following 
alternative expression for P: 


tft 
P= | g(u,v; p)dudv 


—-@ 








tH n = ue >! of Vek in eae 
—5 {(@+1)@- Zant {(3#°- 74 — 5t? — 3) G—1@’[3t4(c4 + 3c? + 3) — 12(c2 +5) — 3]} 
tH 10 8 6 4 2 i ait . ea 
~ 192n 5 {(t —Illt + 14t + 6t — 3t — 15) G—tG [t (c + 5c®+ 10c* + 10c? + 5) 
— 19(7c8 + 29c4 + 46c? + 34) — t(ct + c? — 6) — (5c? + 3) — 15} 
tH 
— 76080n — {(15¢!4 — 375t!2 + 2225¢19 — 214118 — 93948 — 213¢4 + 915t? + 945) G 


— tG’[15t#2(cl2 + 7c! + 21c8 + 35c% + 35c4 + 21c? +7) 

— 15#19(19c!© + 115c8 + 291c® + 395c4 + 305c? + 129) 

+ 5t8(205c8 + 1055c® + 2209c* + 2395c? + 1415) + 19(9c% + 6c — 71c? — 939) 

+ #4(63c! + 305c?— 213) + 151(21c? + 61) + 945]}—.... (28) 
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Equating (27) and (28), we note that the first terms of each are identical and can be can- 
celled. If we equate the second terms, we can obtain a solution for x—¢ as a function of t 
which is correct to terms of O(n-'). Using the result and equating the second and third 


terms, we can obtain a solution correct to terms of O(n-*). Continuing in this way, we 
obtain the following: 


2g Lefer pinion 
bei in \°+1-G|—gees 


{1 13¢4 — 8#-—3-— i [t4(3c* + 9c?) 





t@’\? tG’\3 
2( _ n2__ _ ~; Os Psok 
+2 —c?—17) 6]+(%) ( Ot 6) +3(G) | 
; 35t6 + 194 + 2-15 1 (8+ & 5 + 7¢4) + 18( — Te — 32c4 — 57c?) 
~ 384n8 G 
+ t#( —ct— 3c? + 55) + 2( — ec2+18)—12]+ () [t®( — 6c* — 21c?) 
+ t4( — 3c4— 13c? + 50) + ¢2( — 2c? + 40) + 9] 


-(F ) [t*( — 3c — 12c? + 14) + #2( — 2c? + 36) + 12] 


+(G) &-e eto (5) | 


t 


“i saieome | 02710 82240 + L02t4 + 16808 + 945 


t@’ 
— Fy [PA 15c!? + 105c3° + 255e8 + 225c8) 


+ #19( — 285c! — 1785c8 — 4245c8 — 4185c4) + 18(1025c8 + 5695c® + 13265c* + 17275¢c2) 
+ t8(9c8 + 241c* + 1769c? — 11319) + #4(48c4 + 1040c? — 3548) 


\ 2 
+ #?(420c? + 2820) + 1800]+ (5) [#9( — 1658 — 960c® — 1575c*) 


+ t8( — 60c8 + 390c® + 3900c* + 11430c2) + 16(330c% + 2245ct + 7860c? — 13765) 
+ t4(120c* + 2680c? — 11380) + #2(750c? — 1320) + 1440] 





-(%) [t8( — 105c8 — 750c® — 735c* + 2700c?) 


+ #8(270c® + 2670c4 + 10080c? — 7020) + #4(385c* + 3880c? — 14735) 
tc’ 





+ t2(1140c? — 6600) — 180] + ([ ) [t8( — 90c8 + 570c* + 3750c? — 1350) 


+ t4(510c! + 3750c? — 9315) + #2(870c? — 7620) — 1710] 


-(%) (t4(255c4 + 1515c? — 2310) + #2(525c? — - 4755) — 1620] 





+ (%) [#(150c? — 1275) — 900] — (5) [- 255]} eee (29) 
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Using a cumulative term-by-term procedure similar to that described above, the series 
(29) can be inverted to give ¢ as a function of x. The final result may be written as follows, 
where G and @’ now represent the same functions as in (27) with the argument changed 
from ¢ to x: 

x") 


t= 2+—{a?+1- 
a+ = {a+ | * dent 


in {B+ 162?2+3 


, \ 2 
— 22 [at(Set + Bet) + 2% — 7c +13) +12]+ (7) ([22( — 6c? — 3) +12] 


x’ 
6 il 5 8(¢8 4 
-(F =) (s]|+ 384n sen (8 +1944 + 17%?—15 oid (c8 + 2c® + c*) 


+a8(— 10c8— 7c4 + 8c?) + a#(24c4 + 6c? + 13) + 22( — 29c? + 43) — 3] 





\2 
+ (G [a®( — 6c — 9c4 — 3c?) + a#(38c4 + 23c? — 4) + x2( — 60c? + 15) + 36] 





aQ\3 
-( 7) [a#(12ct + 12c2 + 2) + 2*( — 46c?— 16) + 39] 


+ (7) [x*( — 10c2— 5) + 18]— (=) 3] 


8 6 - 2 5 
+ sono 4 {792 + 77628 + 148224 — 1920a? — 945 





Gy’ 
- + [x?2(15c!2 + 45c!0 + 45¢8 + 15%) + x19( — 345c! — 705c8 — 375c% — 15c*) 


+ 278(2195c8 + 2845c® + 775c4 + 125c?) 


+ a°( — 5241c®— 3589c! + 379c? + 471) + 24(5478c4 — 290c? + 3032) 
, aG’\? . 
+2( — 1950c? + 1770) — 3240] + G) [x19( — 21019 — 525c8 — 420c8 — 105c*) 


+ 2§(2880c8 + 4890c* + 1980c* — 30c?) 
+ x®( — 11830c8 — 114454 — 900c? — 155) + 24(19240c! + 7600c? + 400) 


\3 
+2( — 12780c? + 5130) — 360] — (=) [8(915c8 + 1830c% + 1095c! + 180c?) 
Ft 


+x8( — 8370c® — 10950c* — 3000c? + 60) 
+ 24(22835c! + 15080c? — 145) + 22( — 23280c? — 2280) + 7110] 


aG’\4 
+ 7) [x8( — 1860c* — 2790c4 — 1110c? — 90) 


+ 24(11280c! + 10050c? + 1395) + 2*( — 18960c? — 6480) + 9360] 


Q’ 
- ( ) [a#(1965c! + 1965c? + 390) + 2*( — 724.5c* — 3285) + 5760] 





\6 
+ (=) [x?( — 1050c? — 525) + 1800] — (3 


i ) (225]| +... (30) 
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5. TABLES AND NUMERICAL RESULTS 


Consider the expression P,,(h,k; p) with h = k = t, say, and denote the resulting expression 
by P,(¢; p). This expression is tabulated for selected value oft and n = 1 (1) 30 (3) 45 (15) 120, 
150, 300, 600, co for p = + 0-5 in Table 1 and for p = —0-5 in Table 2. The computation was 
carried out in the following manner. For each ¢, the exact formulae, (10) and (11), were used 
for n = 1, 2, etc., until agreement was reached in the fifth decimal place with the asymptotic 
formula (23). For all subsequent values of n, the asymptotic formula (23) was used and 
checked in specific instances by formulae (10) and (11). In all cases, the result was rounded 
in the usual way to the nearest digit in the fifth decimal place. By this procedure, the entries 
in Tables 1 and 2 should be correct to the number of figures given except in a few cases 
where the error may be as much as 0-0000055. 

It should be noted that if P,(t; o) is required for ¢ negative, it can be obtained from the 

corresponding expression for ¢ positive using the relation 

P,(—t; p) ae 1+P,(t; p)—2F,(t), (31) 
where F,,(t) represents the c.d.f. of the univariate Student t-distribution with n degrees of 
freedom, which is tabulated, for example, by Hartley & Pearson (1950). 

The value of t,,(P; p) is tabulated for selected values of P and n = 1(1) 30(3) 45 (15) 120, 
150, 300, 600, co for p = + 0-5in Table 3 and for p = — 0-5 in Table 4. For each P, the exact 
formulae (10) and (11) were used in a trial and error manner to determine the smallest value 
of ¢ to three places of decimals which achieves the desired value of P, for n = 1, 2, etc., until 
agreement was reached in the third decimal place with the asymptotic formula (30). For 
all subsequent values of n, the asymptotic formula (30) was used and checked in specific 
instances by formulae (10) and (11). By this procedure, the entries in the table were always 
rounded to the next higher value and should be in error by at most one unit in the last figure 
given. 

In Tables 5 and 6 we give the numerical values of the coefficients in the asymptotic 
expressions used in the construction of Tables 1, 2, 3 and 4. It follows from (23) and (30) 
that, for h = k = t, we can write the first five terms of the asymptotic expressions for P,,(¢; p) 


and t,,(P; p) in the form hy ty Ay 
P,(t; p) = Ag ot +++ (32) 
B, 
and ty(P3 p) = Byt—b4 4 4. - 


For the special cases p = + 4, Table 5 gives Ay, A,, A,, A, and A, for selected values of ¢ 
and Table 6 gives By, B,, B,, B, and B, for selected values of P. A, and By were obtained by 
interpolation in Pearson’s tables (1931). The remaining coefficients were calculated using 
(23) and (30). For large, it is clear that the only limit on the accuracy of the asymptotic 
formulae is the accuracy to which A, and B, can be determined. In the last column of these 
tables are given the smallest values of n for which (32) and (33) agree with the exact formulae 
(using the same rounding procedure in the asymptotic case as in the corresponding exact 
case) to the number of places given in Tables 1, 2, 3 and 4. 


The authors wish to express their indebtedness to Prof. R. E. Bechhofer, under whose 
guidance this work was done, and also to Mrs Shirley Hockett who computed the major 
part of the tables. 
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Table 3. Percentage points of a bivariate t-distribution with p = +0-5 























Probability (P) 
Degrees of 
freedom (n) 
0-50 0-75 0-90 0-95 0-99 
1 0-500 1-708 4-696 9-511 47-733 
2 -446 1-291 2-539 3-805 8-879 
3 -428 1-186 2-130 2-939 5-483 
4 -420 1-138 1-963 2-611 4-408 
5 “414 1-111 1-874 2-441 3-900 
6 0-411 1-094 1-817 2-337 3-608 
7 -409 1-082 1-779 2-268 3-419 
8 -407 1-073 1-751 2-218 3-287 
9 -405 1-066 1-730 2-180 3-190 
10 -404 1-061 1-714 2-151 3-115 
11 0-403 1056 =| = 1-701 2-128 3-056 
12 -402 1-053 | 1-690 2-109 3-009 
13 -402 1050 | 1-680 2-093 2-969 
14 -401 1-047 | 1-673 2-079 2-936 
15 -401 1045 | 1-666 2-067 2-908 
16 0-400 1-043 1-660 2-057 2-884 
17 -400 1-041 1-655 2-049 2-863 
18 -400 1040 | 1-651 2-041 2-844 
19 -399 1-038 | 1-647 2-034 2-828 
20 399 1-037 1-643 2028 | 2813 
21 0-399 1-036 1-640 2-022 2-800 
22 “399 1-035 1-637 2-017 2-788 
23 -398 1034 1-634 2-013 2-778 
24 -398 1-033 1-632 2-009 2-768 
25 -398 1033 | 1-629 2-005 2-759 
26 0-398 1-032 | 1-627 2-001 2-751 
27 -398 1-031 | 1-626 1-998 2-743 
28 -398 1-031 | 1-624 1-995 2-736 
29 -397 1-030 | 1-622 1-992 2-730 
30 -397 1029 | 1-621 1-990 2-724 
| 
33 0-397 1028 | 1-617 1-983 2-708 
36 +397 1-027 | 1-613 1-977 2-695 
39 “397 1-026 | 1-610 1-972 2-684 
42 -396 1025 | 1-608 1-968 2-674 
45 396 1-024 1-606 1-965 2-666 
48 0-396 1-024 1-604 1-962 2-659 
51 -396 1023. | 1-603 1-959 2-653 
54 -396 1-023 | 1-601 1-957 2-648 
57 “396 1022 1-600 1-954 2-643 
60 -396 1-022 | 1-599 1-953 2-639 
75 0-395 1-020 | 1-594 1-945 2-622 
90 “395 1019 | 1-592 1-941 2-611 
105 +395 1019 | 1-590 1-937 2-604 
120 +395 1018 1-588 1-935 2-598 
150 0-395 1017 | 1-586 1-931 2-590 
300 394 1016 | 1-582 1-924 2-574 
600 -394 1015 | 1-580 1-920 2-566 
00 0-39351 1-01391 | 1-57700 1-91634 2-55781 

















Note. The relationship between h, P and n, where h is the quantity given in the body of the table and 
represents the common value of the co-ordinates of the equi-coordinate percentage point (h, h), is given by 





NS h h 1 4(2? — +y?) —}i(n+2) 
P= oJ ozpl'* 7 — 


Table 4. Percentage points of a bivariate t-distribution with p = —0-5 























Probability (P) 
Degrees of 
freedom (n) 
0-50 0-75 | 0:90 0-95 0-99 
1 0-867 2-225 | 5-881 11-850 59-385 
2 144 1-553 2-864 4-231 9-777 
3 -708 1-395 5-333 3-161 5-812 
4 691 1-325 2-121 2-767 4-595 
5 -681 1-286 2008 2-566 4-028 
6 0-674 1261 | 1-938 2-444 3-706 
7 -670 1-243 1-891 2-363 3-499 
8 666 1-231 1-857 2-305 3-355 
9 664 1-221 1-831 2-261 3-250 
10 662 1-213 1-811 2-298 3-169 
11 0-660 1-207 1-794 2-201 3-106 
12 -658 1-202 1-781 2-179 3-055 
13 657 1-197 1-770 2-160 3-012 
14 656 1-193 1-760 2-145 2-977 
15 655 1-190 1-752 2-131 2-947 
16 0-654 1-187 1-745 2-120 2-921 
17 654 1-185 1-739 2-110 2-898 
18 653 1-183 1-733 2-101 2-879 
19 653 1-181 1-729 2-093 2-861 
20 652 1-179 1-724 2-086 2-846 
21 0-652 1-178 1-720 2-080 2-832 
29 651 1-176 1-717 2-074 2-819 
23 651 1-175 1-714 2-069 2-808 
24 651 1-174 1-711 2-064 2-797 
25 650 1-173 1-708 2-060 2-788 
26 0-650 1-172 1-705 2-056 2-779 
27 650 1-171 1-703 2-052 2-771 
28 649 1-170 1-701 2-049 2-764 
29 649 1-169 1-699 2-046 2-757 
30 649 1-168 1-697 2-043 2-750 
33 0-648 1-166 1-692 2-035 2-734 
36 648 1-165 1-688 2-028 2-720 
39 648 1-163 1-685 2-023 2-708 
42 647 1-162 1-682 2-018 2-698 
45 647 1-161 1-679 2-014 2-690 
48 0-647 1-160 1-677 2-011 2-683 
51 646 1-159 1-675 2-008 2-676 
54 646 1-159 1-674 2-005 2-670 
57 646 1-158 1-672 2-003 2-665 
60 646 1-157 1-671 | 2001 2-661 
15 0-645 1-155 1-666 1993 | 2-643 
90 645 | 1-154 1-662 1-987 2-632 
105 645 1-153 1-660 1-983 2-624 
120 644 1-152 1-658 1-980 2-618 
150 0-644 1-151 1-655 1-976 2-609 
300 | 643 1-149 1-650 1-968 2-593 
600i 643 1-148 1-648 1-964 2-584 
oo 0-64235 11465) 1-64457 1-95993 2-57568 
| 




















Note. The relationship between h, P and n, where h is the quantity given in the body of the table and 
represents the common value of the co-ordinates of the equi-coordinate percentage point (h, h), is given by 


P=/" f° aafi+ 


—Hn+2) 
4(x*? + xy + eh dea 
3n 
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Table 5. Coefficients in the asymptotic expansion of P,(t; p) 
for selected values of t and p 










































































p t Ay A, A, A, | A n 
0-25 0-437684 — 0-025870 0-003371 0-003816 | — 0-001050 4 
0-50 0-546245 — 0-057784 0-008999 0-006868 | —0-002155 cs 
0-75 0-651102 — 0-100016 0-021983 0-006891 — 0-001879 6 
1-00 0-745203 — 0-150182 0-047374 —0-006835 | 0-007991 5 
1-25 0-823729 — 0-198378 0-079687 — 0-033130 0-036817 6 
1-50 0-884709 — 0-231628 0-096254 —0-038696 | 0-032808 6 
1-75 0-928814 — 0-240531 0-067469 | 0-052274 | — 0-191482 9 
2-00 0-958553 — 0-223682 —0-020268 | 0-293449 — 0-819219 12 | 
2-25 0-977260 | —0-187525 —0-149011 | 0-623867 | — 1-618705 is 4 
2-50 0-988250 — 0-142571 —0-276255 | 0-858993 | — 1-765249 ” 
3-00 0-997382 — 0-062685 —0-376815 | 0-432592 2-236773 18 | 
ioe oe Le | 
0-25 | 0-282077 | -—0-026677 |  0-003458 0-003805 | —0-001048 4 | 
0:50 | 0-419223 | —0-067860 |  0-012805 0-005089 | —0-001195 4 | 
0-75 | 0-559679 | -—0-134168 |  0-044880 | —0-012987 0-016283 5 | 
1-00 0-686472 —0-213243 | 0-098390 — 0-052516 0-044512 , I 
1-25 | 0-789599 —0-278618 | 0-131625 — 0-029995 — 0-127090 9 | 
1-50 0-866558 —0-310774 | 0-105272 0-112851 | —0-498494 a. | 
—0°5 
1-75 0-919908 — 0-305392 | 0-014720 | 0-322909 — 0-642101 8 | 
2-00 0-954503 | —0-269699 | —0-120697 | 0-556270 — 0-697258 9 | 
2-25 0:975551 — 0-216425 —0°268215 | 0-770582 — 0-926531 8 
2-50 0-987581 | —0-158846 — 0-387429 | 0-840638 | —0-678115 8 | 
3-00 0-997300 — 0-066478 — 0-435425 | 0-171531 | 3-163945 17 | 
Table 6. Coefficients in the asymptotic expansion of t,,(P; p) 
for selected values of P and p 
| | | 
p P B, B, B, Bs | B, n 
| |] 
0-50 1-000000 0251277 | 0-039480 | —0-030211 — 0-008417 2 
0-75 1-000000 0-438753 | 0-193001 0-038173 — 0-007468 3 
+0°5 0-90 1-000000 0-798448 | 0-623867 | 0-356845 | 0-118483 3 
0-95 1-000000 | 1-098969 | 1-140833 0-900948 | 0-472589 5 
0-99 1000000 | 1-832368 | 3-032947 3-802956 | 3464277 8 
| | 
| | | 
0-50 | 1:000000 | 0-284231 | 0-068289 — 0-:009598 — 0-009109 2 
0-75 | 1-000000 0-550243 | 0-263130 | 0-102643 | 0-005626 2 
—0°5 0-90 1-000000 0-921220 | 0-810736 | 0-441786 0-418648 , 4 
0-95 | 1-000000 1-209261 | 1-417233 | 1-120851 | 0-582289 | 6 
0-99 | 1-000000 1-908503 | 3-427819 | 4-670259 | 4:009603 | 8 
| | | 
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A TWO-SAMPLE MULTIPLE DECISION PROCEDURE FOR 
RANKING MEANS OF NORMAL POPULATIONS WITH A 
COMMON UNKNOWN VARIANCET 


By ROBERT E. BECHHOFER, CHARLES W. DUNNETT{ anp MILTON SOBEL 


Cornell University 


A multiple decision approach to the problem of ranking populations according to their population 
means has been formulated by Bechhofer (1954). A single-sample solution to this problem was pre- 
sented by him for the case of normal populations with known variances. In the present paper the 
case of normal populations with unknown equal variances is considered. A two-sample procedure is 
proposed as a solution to the latter problem; this procedure is of the type used by Stein (1945) for 
obtaining a test of Student’s hypothesis with power independent of the variance. Tables are available 
which enable the experimenter to apply this procedure with little computational effort to the ranking 
of two or three populations with unknown but equal variances. The same tables can be used when 
the variance ratios are arbitrary and known. Graphic comparisons of the expected sample size are 
made for the single-sample and two-sample procedures. 


1. INTRODUCTION 


Because of the widespread popularity of the analysis of variance as a statistical technique 
for analysing experimental data, there sometimes is a strong tendency to misapply it. 
Many problems are forced into the mould of a test of homogeneity (i.e. a test of the equality 
of population means) when actually such a test will not answer the basic problem of the 
experimenter. Moreover, there are many practical situations in which the experimenter 
actually knows a priori that the population means are unequal. In these situations he often 
desires either a complete or partial ranking of the populations according to their population 
means, e.g. he may wish to find the population with the largest mean. Thus from the statis- 
tical point of view, the experimenter requires a multiple decision procedure rather than a test 
of homogeneity. The procedure must provide a rule which tells him how to effect his desired 
ranking of the populations, the rule having the property that the probability of a correct 
ranking will in some sense be controlled. A single-sample procedure has been proposed by 
Bechhofer (1954) as a solution to this problem for the case when the population variances 
are known. This paper proposes a two-sample procedure for the case when the populations 
have a common unknown variance. Except for the assumption of known variances made in 


the first paper, all assumptions made in both papers are the same as those for the linear 
model of the analysis of variance. 


2. ASSUMPTIONS 


Let X;; be normally and independently distributed chance variables from population 7; 
with mean yp; and variance o? = a,o* (i = 1,2,...,k; 7 = 1,2,..., ad inf.). We assume that 
o? and the y; are unknown, and that the a; are known positive rational numbers which 
without loss of generality can be taken to be integers. (Although the basic problem of interest 
is the case a, = 1 (i = 1, 2,..., &), it will be shown that with little extra effort we can consider 
the more general case of known variance ratios described above.) 


t+ The research was sponsored by the United States Air Force, through the Office of Scientific 
Research of the Air Research and Development Command. 
t Now at Lederle Laboratories, Pearl River, New York. 
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We denote the ranked wy; by = yy < gy < --- < oy, (1) 
and the differences between the ranked yu, by 


OP = Ma — Ay (t,j =1, 2, — % (2) 
It is not known which population is associated with /;,. 


3. THE EXPERIMENTER’S GOAL AND REQUIREMENTS 
We assume that the experimenter wishes to effect either a partial or a complete ranking of 
the populations according to-their population means, and that he specifies his goal exactly 
before experimentation starts. There are many goals of practical interest, but the following 
two are of special importance and will be treated here in detail: 

Goal I: To select the population having the largest population mean. 

Goal IT: To select the populations having the largest, second largest, ..., and smallest 
population means. 

Clearly the goal of selecting the population having the smallest population mean is equi- 
valent to Goal I. For a more general goal which includes I and II, above, the reader is 
referred to Bechhofer (1954). 

For Goal I it is assumed that the experimenter can specify the smallest value of 4, ,_;, 
say 6*, that he is interested in detecting. The experimenter also must specify the smallest 
acceptable value, say P, for the probability of achieving Goal I when 6, ,_, 2 6*. 

For Goal II it is assumed that the experimenter can specify the smallest values of the 
6:11, Say Of,1,; (¢ = 1,2,...,4—1), that he is interested in jointly detecting. He also must 
specify the smallest acceptable value, say P, for the probability of achieving Goal II when 
6341.42 5f42,¢ (¢ = 1,2,...,4-1). 

4, THE PROCEDURE 

It should be noted that a single-sample procedure cannot satisfy the requirements of the 
problem, i.e. for Goal I, for example, such a procedure cannot guarantee the probability P 
of a correct ranking when 6; ,_, 2 6*, irrespective of the true values of the unknown variances 0%. 
Roughly speaking, this is so because for any reasonable procedure which calls for taking 
a single sample of N; observations from 7; (¢ = 1, 2, ..., &), the probability of a correct ranking 
will depend on the o?. If the true values of the o? are sufficiently large, then it is intuitively 
clear that the probability of a correct ranking will be arbitrarily close to 1/k for Goal I and 
1/k! for Goal II. A rigorous treatment of this point would proceed along the lines used by 
Dantzig (1940). 

The following two-sample procedure which is of the type used by Stein (1945) will be 
shown to satisfy the requirements of the experimenter: 


Procedure 


1. Take a first sample of a;N, observations from 7; (i = 1,2,...,%). (Recall that the 
a; (t = 1,2,...,%) are known integers. Any integer N,, for which n (defined in (4) below) is 
positive, will satisfy the requirements of the problem; the question of an optimum choice 
of N, will be discussed in § 5.) 


9 ‘ = : 
2. Calculatet ee x Xij 
S2=- yb —  —- , 


ca 7 
Nim % j=1 - a,No 





(3) 


+ For simplicity of notation no attempt will be made in this paper to distinguish between chance 
variables and their observed values. 
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which is an unbiased estimate of o? having 
nie M Bia bh (4) 
degrees of freedom. iT 
3. Take a second sample of (N — N,) a; observations from 7; (i = 1, 2,...,), where 
N = max {Ny, [253(h,/8*)?}, (5) 


the symbol [y] denoting the smallest integer equal to or greater than y; here h,, = h (say) 
is a positive constant (depending on n, P and the goal), which will be defined below. 


4. Calculate for each 7; the over-all sample mean 


xX te Ae gs oe tae 6 
‘= oN ij (= 9 Hy ee%9 ). (6) 


Denote the ranked} values of X; by 





Xin < Xi <...< Xu. (7) 


5. Rank the 7; according to the ranking of the observed X, i.e. for Goal I select the 
population which gave rise to X;,, as the population having the largest population mean, 
while for Goal II select the populations which gave rise to Xj), Xy,_1),..., and X;) as the 
populations having the largest, second largest, ..., and smallest population mean, respec- 
tively. 

Now we shall show that this procedure, by a proper choice of h, satisfies the requirements 
of the problem. Since the probability of achieving the goal is a strictly increasing function 
of ;,,,;(¢ = 1, 2,...,—1), the requirements of the problem will be met if we can guarantee 
the specificed probability P of a correct ranking for Goal I when 


9x k-1 = o* and i433 = 0 (a = ry 2, ..+,k—2) (8) 
and for Goal II when 6444 = Sir; (¢=1,2,...,4—1). (9) 


First we consider Goal I. Define X;, as that one of the sample means defined in (6) which 
comes from the population having mean /;,, i.e. the expectation of X,y is wy (i = 1, 2,...,k). 
Then we can write the probability of achieving Goal I as 


Pr {Xiy< Xy (2 a 1, 2,...,k—1)} 
N ba —_ X * N * 
= te lo Zepto gees 


S 2S 


(= 1,2,...,k-D}. (10) 
; Nis -¢ : : NV é* 
Denoting jz (Xip—Xyy+d*) by z; (¢ = 1,2,...,4—1), and noting that JaE2" as 

0 
a consequence of (5), it follows that the probability (10) is greatert than 


(2, 4 | 
Pri = .--5k—1)}. 
rigs (j= 1,2,....% vj (11) 


t If, because of the limitations of the measuring instrument, two or more X, are equal, they should 
be ‘ranked’ by using a randomized procedure which assigns equal probability to each ordering. 

{ It is also possible, following Stein’s initial procedure, to devise a procedure which will guarantee 
a probability of a correct ranking exactly equal to a specified probability. 
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When (8) holds, the conditional joint distribution of the z; given S, is the multivariate normal 
distribution with means 0, variances o?, and correlation matrix {pis} where 


L if 4=j + 
Pj = { y nay. (12) 
If in the joint distribution of the z; and S, we make the transformation z; = t;S, and in- 


tegrate out Sp, then (11) can be written 


ce ho -Tfh(n+k-1)] | k=1k-1 ~iin+k—1) 
i +r © Jk ( (4n77) )ke- cra Be (4n) ser ~ PI bt a t; dt, dt, ... dt,_1, (13) 





2(k—-lj/k if t=), 
—2/k if ++). 
Thus in order to satisfy the requirements of the problem, it is sufficient to set (13) equal to 
the specified probability P and solve for h. 

To solve for h, tables of (13) are required. For k = 2, (13) reduces to the univariate 
Student’s t-distribution which is tabulated, for example, by Merrington (1942). For k = 3, 
(13) can be regarded as a bivariate generalization of Student’s ¢-distribution with correla- 
tion plus one-half; the required table has been computed and is given as Table 3 in the paper 
immediately preceded by Dunnett & Sobel (1954). (This table gives h to three decimal 
places for P = 0-50, 0-75, 0-90, 0-95 and 0-99 and n = 1(1) 30(3) 60(15) 120, 150, 300, 600, 
00.) For k2 4, (13) can be regarded as a multivariate generalization of Student’s t-distribu- 
tion; the required integral has not been tabulated. 

Now we consider Goal II. For simplicity we shall assume (see (9)) that df, , = 6* 
(¢ = 1,2,...,4—1). A similar argument to that given for Goal I will show that the prob- 
ability of achieving Goal IT is greater than the same expression (13) where for Goal IT 
we have 


where for Goal I we have b;; = { (14) 


b; = 2min (i, j){1—;max (i,)}. (15) 


Again we determine the constant A to guarantee the specified probability P by referring to 
appropriate tables. For k = 2, tables of Student’s ¢-distribution are again applicable. For 
k = 3, (13) now can be regarded as a bivariate generalization of Student’s t-distribution with 
correlation minus one-half; the required table has been computed and is given by Dunnett 
& Sobel (1954) as Table 4. (For this table h, P and n are given as in Table 3.) For k2 4 the 
required integral has not been tabulated. 

Explicit formulae for the exact (for any n) and asymptotic (for large n) evaluation of (13) 
for k = 3 and any correlation coefficient also are given by Dunnett & Sobel (1954). 


5. EXPECTED SAMPLE SIZE 
Stein (1945) has given an exact formula as well as upper and lower bounds for the expected 
total sample size in his two-sample procedure. His results are applicable to our procedure. 
The expected sample size for population 7; isa; H(N), and E(N) is given (to within a quantity 
less than unity) by the equation 


*\ 2 No *\ 2 nNo o* 
E(N) = NyPr |x. < Sys (=) \4 + 2h? (5: ) Pr abea> Ss (=) |+oRr{xa>58 (=) , 


(16) 
where 0<0<1 and 4? is the x? variable with n degrees of freedom. 
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A numerical investigation of the relationship between H(N) and N, was carried out for 
Goal I with & = 3, for P = 0-95 and 6*/o0 = 0-1 (0-1) 0-5. The curves in Fig. 1 illustrate the 
results. They were computed using (16) and an asymptotic formula derived by Goldberg & 
Levine (1946). 

If the value of o were known, it would be possible to determine an N such that a single- 
sample procedure requiring a sample of size Na, from population 7, would satisfy the 
requirements of the problem. Tables have been given by Bechhofer (1954) which give 
JN (8*/c) as a function of P. For each curve in Fig. 1 there is shown for comparison a hori- 
zontal line to represent the corresponding value of N for the single-sample procedure. For 


1000 
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E(N), expected total sample size per population divided by a; 











30 
20 =-= Single-sample 
—— Two-sample 
1 ea ewe 4 ] — — ae 
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No, size of first sample divided by a; 


Fig. 1. Graph of expected total sample size per population for the single-sample 
and two-sample procedures (Goal I, k= 3, P=0-95). 


any given P and 6*/c, H(N) is always greater} than N, but as is shown in Fig. 1, which depicts 
the case P = 0-95, the difference is small over a wide range of values of N,. Eachof the E(N) 
curves in Fig. 1 appears to reach its minimum value for N, equal to approximately 80 % of 
the value of NV’. A feature of the distribution of N which is not evident in the diagram is the 
fact that its variance increases as N, decreases. This is an additional reason why N, should 
not be chosen much smaller than that for which the minimum value of Z(N) is obtained. 

In practical situations where o* is unknown, the experimenter sometimes can place a 
non-zero lower bound 0? and a finite upper bound ¢ on its value. To meet the requirements 
of the problem, he has at least two reasonable courses of action: He can use a single-sample 
procedure and assume that o? = o3. Alternatively, he can use a two-sample procedure. 
The curves in the diagram show that by choosing N, as if 7? = o%, he may be operating with 
E(N) considerably smaller than the N required in the single-sample procedure. 


t More precisely, E(N) > 2(h,,7/8*)? > 2(h,,0/8*)? and N = [2(h,,0/8*)*]. 





~ rh ss ,ul 


vx_ 
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6. EXAMPLES 
The following examples will illustrate the procedure: 


Example 1. Given three normal populations with a common unknown variance. Suppose 
that it is desired to select the population having the largest population mean and to guarantee 
that the probability of correctly choosing that population will be at least 0-90 when 
/i31 — {2 2 1. How should the experimenter proceed ? 

The choice of No, the size of the first sample from 7; (¢ = 1, 2,3), is optional. (Note that 
a; = 1 (i = 1, 2,3) for this problem.) Suppose that for certain practical reasons the experi- 
menter decides to choose Nj = 10, and based on his three first samples he finds that 


10 10 10 

Y X,, = 95-46, > X,,; = 114-54, > Xz; = 112-01, | 

j=1 j=1 j=l 17 
10 10 10 ( ‘) 
> X32, = 948-5306, > X3, = 1337-1139, > X32, = 1280-7061. 

j=1 j=l j=1 


Using (3) he computes SZ = 3-2787, his estimate of o* based on n = 27 degrees of freedom. 
Now the experimenter has specified P = 0-90 and 6* = 1. Entering Table 3 in Dunnett & 
Sobel (1954) with n = 27 and P = 0-90 he finds that h = 1-626. Using (5) he finds that the 
total sample size per population is given by 


N = max {10, [2(3-2787) (1-626)?]} = max {10, 18} = 18. 


Hence N — N, = 8 additional observations must be taken from each population. Nowsuppose 
that based on his three second samples he finds that 
18 18 


18 
> X,, = 8450, > X,,=97-41, > X,, = 95-33. (18) 
j=11 j=11 j=11 


Using the XX,; in (17) and (18), he computes the overall means 
X,=9-998, X,=11-775, X, = 11-519, (19) 


each of which is based on 18 observations. As his last step, he selects the second population 
(the population yielding the largest sample mean, X, = 11-775) as the population having 
the largest population mean. 

If it had happened that 2S32(1-626)? < 10, no second sample would be necessary, and the 
experimenter would base his selection on the same means computed from (17). 

The procedure for handling the problem considered in the next example is not covered 
explicitly in the paper, but is easily justified using the same methods. 


Example 2. Consider the same problem as that described in Example 1, the only differ- 
ence being that now there is available to the experimenter an earlier estimate S* of o? 
based on n degrees of freedom obtained from similar experiments made on these or other 
normal populations with the same variance. How should the experimenter proceed? 

There is no need now for the experimenter to take any first samples, since their major 
purpose was to supply the estimate of o?. Suppose that S? = 3-0000 and n = 11. Entering 
Table 3 in Dunnett & Sobel (1954) with » = 11 and P = 0-90 he finds that A = 1-701. 
Since he has the earlier estimate he needs to take only one sample consisting of 


N = {[2S(h/d*)?]} = {[2(3-0000) (1-701)?]} = 18 
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observations from each population. He computes his sample means and selects the popula- 
tion yielding the largest sample mean as the population having the largest population mean. 

Note 1. Even if an earlier estimate of 0? is available, the experimenter still may want to 
adopt the two-sample procedure, particularly when the number of degrees of freedom 
associated with his estimate is small. He will do this with the idea in mind of cutting down 
on the expected value and variance of NV. (Recall that for Example IT, N is proportional to 
h? and h increases as the degrees of freedom decrease for fixed P. For this problem with 
P = 0-90 and n = 1, we have h = 4-696.) If he does adopt the two-sample procedure, the 
estimate of o? obtained from the first samples should be pooled with the earlier estimate 
of o?, and the sum of the two degrees of freedom should be used for determining h. 

Note 2. The user of the tables in Dunnett & Sobel (1954) is cautioned that these tables 
are applicable only if the variances of the sample means, on which the final ranking is to be 
based, are equal. 
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INEQUALITIES FOR THE NORMAL INTEGRAL INCLUDING A 
NEW CONTINUED FRACTION 


By L. R. SHENTON 
College of Technology, Manchester 


1. IytTRODUCTION 


In this note we consider the related normal integral ratios 


sometimes called Miils’s ratio, and 

b t 

Rt) = | o(e)delgt, 

] a , , 
where g(x) = (an) ei”, and R(t)+ R(t) = 1/[2g(t)]. We give a new continued fractiont 
—_ V a 
for R(t) which is rapidly convergent for small values of ¢t and which incidentally provides 
a new set of inequalities. The rapidity of convergence is compared with a series for R(t) 
and with the Laplace c.¥. for R(t). This assessment is similar to recent work by Teichroew 
(1952) on the comparative rapidity of convergence of series and.c.F.’s for the elementary 
function e*, n(1+2) and arctanz. Lane (1944) has also considered the same sort of thing 
for interpolation, comparing Newton’s series and Thiele’s C.F. in the case of the function 2°. 
We also prove and generalize a conjecture of Birnbaumt (1950) that for ¢>0, 


Rit) < 4/[3t+ / (+8), 
it being shown that there are two sequences of similar inequalities, increasing and decreasing 


to the limiting value R(t). In the Appendix we set out a brief summary of results relating 
to the normal integral. 


2. THE CONTINUED FRACTION FOR R(é) 
The new C.F. is given by 


5 ( @. 2 3 4 oF 
Pia) isn Ss 382 er ie any ee 1 
(t) 1-3+5 -—-7+9 —-l1l+..., (1) 
and is convergent for ¢> 0. This is derived from the o.F. representation of the series 
5 t 8 ‘6 
R(t) = =+— + +..., (2) 


1°1.3°1.3.5 
which may be expressed in terms of the confluent hypergeometric function 


ax a(a+1)2? 


We then use the c.F. of Gauss (1812, pp. 134-5) in its confluent form 


l 2 “ke ba 2a =(b+1)2 
1—6+6+1—64+2+6+3— 6+4 +.... 


(See, for example, Perron, 1913, pp. 311-14; Wall, 1948, pp. 349-55.) 


F(a; 6; xz) = 1+ 





1F,(1; 6; x) = (4) 


{ Continued fraction will be abbreviated to c.¥. throughout. 
t Birnbaum actually conjectured (¢?—1) R?-3#R+2>0. 


Biometrika 41 sag 
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The expression (1) now follows since R(t) = t,F,(1; 3; #4). Denoting the sth convergent 
of (1) by r, = p,/9,, it follows that p, and q, satisfy the recurrence relations 
Yooss = (48+ 1) Yog+ 2807444 
Yos = (48 ~ 1) Yog_1 — (28 — 1) PY a, 2 
with py = 0, p, = t,99 = 1,q, = 1. 
It may be shown that for t> 0, q,, >, d4,,1 > 0, so that from (5) we have 


} (s'= 1,2, ...), (5) 


1 <1 <14<153<1%g<%y<-..<R (t>0). (6) 
The situation for the remaining convergents is not so simple, but it transpires that 
T4s42.>4s43>Tss+6>Tas17>--->R (t>0), (7) 


for some pos)tive integer s. We therefore have the following simple inequalities 








105t + 5t3 945¢+ 10548 — 

on 8h 8 = - R (t>0 8 
"1 ='<e = 195-3084 154 < "5 = 945—210F + 15n~ > > (8) 
ys tg OP ois 10395¢+ 6308+ 630° (9) 
2 3-#° 3 15—32~ © 10395— 28352 + 315¢4— 151°” 


the ranges of t for (9) being 0<t<./3, 0<t<./5, 0<t<,/8-283 respectively. 
We now write the c.F. as 
* * ” * * * 
t¢ P 2 3 4? Sf Gf Tf Sf Of 106 


I Fa ~ ef 2 OT as Se 10 
~ 1-3+ 5 -— 7+ 9 -—114+13-—154+17-—19+4 21 -..., (10) 





the convergents corresponding to the asterisks forming a decreasing sequence exceeding 
R(t), the remaining convergents forming an increasing sequence less than R(t), this state 
of affairs holding sooner or later for all ¢> 0. It certainly always holds for 0<t<./3. Other- 
wise, for example, if t = 2, 2-5, 3-0 it holds from the 3rd, 4th and 7th convergents, respec- 
tively, onwards. Further properties of the c.F. are listed in the Appendix. 


Numerical illustration 


We give in Table 1 the convergents for the examples t = 0-1 andt = 3-0. It will be seen 
that for t = 0-1 eight convergents give accuracy in the 22nd place of decimals, a rapid rate 
of convergence of about 6 decimal places for every two convergents. For ¢ = 3-0 the con- 
vergence is slow at first, being due to the negative value of some of the early denominators. 
However, convergence soon accelerates, there being a gain of four decimal places between 
the 16th and 20th convergents. We may mention that the increase in the number of digits 
in the convergents of the o.F. creates a computational difficulty. Thus with ¢ = 2-0 the 18th 
convergent involves 21 digits; with t = 2-5 employing an equivalence transformation to 
clear decimals, the 22nd convergent involves 35 digits. This situation can be avoided if we 
know the number of terms required, or by curtailment at the expense of some loss of 
accuracy. Three methods of using c.F.’s on high-speed computing machines are discussed 
by Teichroew (1952). 

In Table 2 we give the values of the absolute differences | Vr,| = | r,—7,_, |, and the corre- 
sponding differences | V7,| = | 7,—7,_, | for the series (referred to as series C) 





t5 es nr, 


- t 
R(t) = et- + — oe 
) oe ee eo 





rTe- 
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Table 1. Convergents of R(t) 
8 f. | 
t=0°1 1 0-1 
2 0-100, 334, 4* 
3 0-100, 334, 001, 3* 
4 0-100, 334, 000, 953, 2 
5 0-100, 334, 000, 953, 439, 993 
6 0-100, 334, 000, 953, 440, 116, 2* 
7 0-100, 334, 000, 953, 440, 116, 180, 63* 
8 0-100, 334, 000, 953, 440, 116, 180, 61 
t= 3-0 | 1 3 
2 —1+5 
3 — 8-2 
7 164-0 
8 98-7 
| 15 112-515, 2* 
16 112-515, 1 
} 19 112-515, 152, 98* 
20 112-515, 152, 96 
Note. Convergents marked with an asterisk exceed the value of R(¢). 
Table 2. Differences of successive convergents of R(t) and of corresponding 
partial sums of the series C 
| | 
8 | | Vr,| |V7,| ES" | Vr,| |V7,| 
ae 
t=0-1 2 3-33 3-17 t=20 | 6 0-28 0-36 
| 4 9-39 9-30 8 2-52 1-25 
6 15-12 15-24 10 4:70 2-11 
| 8 22-18 21-10 12 6-62 4-33 
~ 4 | 14 8-40 6-72 
| t=0°5 2 1-45 1-24 16 10-19 7 12 
| 4 4-34 4-26 18 13-7 9-16 
5 8-68 7-13 © | 
8 | 12-64 11-36 t= 3-0 14 2-4 0-5 
| 10 | Te36 15-61 16 49 1-4 
| 18 5-2 2-3 
t=1-0 . °F 0-50 0-27 20 7.2 3-2 
| 4 | 36s 2-49 | 22 53 | 56 
ec 421 | 439 | 24 3 | 6 
Z 731 | 617 | 26 i320 |B 7 
10 10-27 | 39-47 | 
12 13-15 | 12-77 | | 
t= 15 2 1-45 | 1-17 | | 
4 0-24 | 0-16 | | | 
6 2:35 | 2-63 | 
s 426 | 314 | 
10 611 | 5:19 

















Digits in bold type before the decimal place + imply that the decimal is to be multiplied by this 
power of ten; thus 1-45 means 0-045 and 1-45 means 4-5. 


+ The notation is due to A. C. Aitken (1925, footnote to p. 293). 
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for several values of t. These are given for s even, since in this case R(t) always lies between 
r, and r,_, for 0<¢<,/3 (and sooner or later for all ¢> 0). Similar remarks apply to T, and 
T._,. Thus the error in the approximations r, and 7, is less in value than | Vr,| and | V7, | 
respectively. It will be observed that the c.F. converges ultimately more rapidly than the 
series, as far as the present analysis is concerned. Whether this state of affairs holds for 
larger values of s than those considered can only be conjectured. For the early convergents 
the series has a slight advantage, especially for small t; but for > 1 the c.F. soon becomes 
more accurate, although for larger values of ¢ there is the disadvantage of the negative 
approximations in the early stages. However, this disadvantage may be overcome by using 
the generalization of (1), namely, 








BR s—1 f2rti 

t= 

) 2,1.3.5.....(r¥1) 
s. Pre sae. t (2s+1)t 20 (28+3)0 40° 1) 
1.3.....(28+1)|2s8+1— 28+3 +28+5-— 28+7 +28+9-...]° 


If s is taken large enough in this expression (which converges for ¢ > 0) negative approxima- 
tions are avoided. Numerical examples support the view that the rate of convergence of 
(11) is similar to that of (1), but from an analytical point of view there are difficulties. 

The series (2) in its rate of convergence is not a serious rival to the expansions considered 
in Table 2. Thus the o.F. appears to be on the whole as good an approximation to R(t) as 
any other. 


3. THE LAPLACE CONTINUED FRACTION FOR R(t) 
It can be shown (see, for example, Bromwich, 1926, p. 388) that 





Po g(x) dx gL f° eta 
acohhe ieeenaeal cee : 
*) * -o 22+ 8 2J/r7Jo x+}?? -. a 
1 1 1.3 1.3.5.....(28—1) 
so that RY) = orate... (4 —— a RO (13) 
yf? 2%+2g(x)da 1.3.5.....(28+1) 
where | R,(¢)| = amin (* cy ae eet, 


which states the familiar fact that the error involved in using any number of terms of (13) 
is always less in value than the first term omitted. 
In terms of the hypergeometric function we have 











ob alt £8 1.3.....(28—1) 
Rit)=o—- patos t(-) peti 
\ Oe ph hale. 2s+1 . 2c 
+(-— PS at ? tim F(1, 84950; - 3), (14) 


so that using the c.¥. of Gauss for this case we find 











| Ms Bey 1.3.....(28—1) 
Rit) =F -ptpe + (- 1 fast 
eae 2 2 
mY ya 1-3... -(28+1)f1 28+3 2 28+5 4 28+7 |. (15) 
{2s+2 t+ ¢ +t+ ¢€ +t+ t¢ + 


a 
é ~~ ~ 





(13) 


(15) 
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the c.F. part of this converging for t > 0, the odd convergents forming a decreasing sequence, 


and the even convergents an increasing sequence. If we take s+ 1 = 0 in (15) we derive the 
Laplace (1805) o.F. te SF Se 


Re)aé SS vhs Hore 
OTe t4T4Eettit 


The following results are required in the sequel: 


(i) Denoting the sth convergent of (16) by ¢, = x,/,, we see that x, and , follow the 
recurrence relation 


(16) 


s = ty, + —]1 s—2 as BoD, .<-), 
Ys = tys1+(s—1)y.2 (8 ’ 7) 
%=9, m=1 w%=1, wo, =t. 

From (17) it is easily shown that 
$.— 1 = on 1(8 a 1)! /( Ws_1s), } (18) 
$s—Ps-2 rade Gee Fat t(s — 2)!/(W,_2@,), 


so that we have PO kiN sige im (19) 
(ii) The integral for R(t) may be written 











R(t) = i} exp (—4a?— xt) dz, (20) 
0 

which after differentiation and the use of (17) yields 
Ro,— x, = (=) {"(e— 190) delat). (21) 
(iii) Itis well known that w, is a Hermite polynomial and satisfies the differential equation 

dw, ,dw 
24+¢—*_ ow, = 0, 22 
de thay os (22) 
' 2 iu, 

Or if we set w, = e~*” w,, then ia * (}t?+8+4)o,. (23) 


We require an asymptotic formula for a, for large s. Following a method outlined in 


t | 
Jeffreys & Jeffreys (1946, § 17-122) we put w, = exp {{ (8) as) , 80 that 7? +2 = }+8+h. 
0 


This leads to the asymptotic expressions (in s) 


1.3.5.....(28— 





wt. = 
Wo, = € (1+ /(8s+2 nF ) cosh {Ag, ()}, 
ye. 3.5.....(2841 ) ant ast }} | 
Oat? “Tl +2)(88+6) /(28+8) 
where A,(t) = $t/(f+84+4)+(8+4) In {/(14+ 2/(48 + 2)) +¢/,./(48 + 2)}. 


Using an approximation to the factorial we find the following approximations to the 
difference between two convergents: 
$os41— Pog = 2 /(2m) {(1 + 7/(88 + 2)) (1+ /(8s + 6))}* exp (Fi? — $t./(88 + 2) — $t,/(88+ weg 
bos-1— $04 24(27) {(1 + 2/(88— 2)) (1 +#/(88 + 2))} exp (Ht? — Ht./(88— 2) — 4t(88 + 2)}. 
(25) 
The more complicated formulae for these differences found by using (24) in (18) is perhaps 
more reliable when ¢ has a moderate value such as greater than three. 
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4. NUMERICAL COMPARISON OF THREE EXPANSIONS FOR R(t) 
We shall consider the expansions 





1123 
ny =+,12 3 (A) 
+t+tttt..., 

tL % 2% 39 

Bagh gi. B 
Ogee Lyf): 8) 
ad t #3 5 t? 

a ae pres. 
SU) in t 3.2.11" 5.22.21 at ©) 


each of which can be used to approximate to R(t) through the relation R(t) + R(t) = 1/(2g), 
from the point of view of rapidity of convergence. We consider how many terms are required 
so that the error is 10-*+» x 2-5. More precisely, if C,_, and C, are successive convergents 
of a C.F. (or partial sums of a series) providing lower and upper bounds, then we evaluate the 
least value of s (an integer) for which | C,_,—C,| < 10~+» x 2-5 is true approximately. This 
is about equivalent to ensuring accuracy in the nth place of decimals. For the Laplace c.r. 
we use the approximation 
2 ./(2m) exp {ft? — ft../(48 + 2) — #t./(48—2)} = 10+ x 2-5, 
the solution of which is given by 
s = Y(t) +1/[16y(t)], where y(t) = [}t+(1-4992 + 1-1513n)/E], 
where the second term is unimportant for small t. As a simple crude formula for (A) we have 
| d,1—9,| = 24/(27) exp ($0? — 2t./s). Thus, approximately, the necessary number of terms 
varies directly as the square of the accuracy, and inversely as the square of t. We have 
already seen from § 2 that the rates of convergence of (B) and (C) are very similar. Indeed, 
for small values of t we have 14.4.0 — 441 = 7(T's942 — T's54.1)/2“1, 80 that (B) has an advantage 
in view of the factor 2-1. A similar relation for ¢ not small appears to be complicated. 
A comparison of the three expansions is given in Table 3. 


Table 3. Number of terms to achieve a given accuracy for three expansions 





| | Accuracy 
































| 
2-5 x 10-7 2-5 x 10-18 
| 
Expansion A | B C A B | Cc 
| 
| | 
t | | 
0-1 7,072 | 4 3 23,460 eA | 5 
0-25 1,135 4 4 3,760 8 7 
0-5 287 | 6 | 5 946 10 9 
0-75 130 6 7 425 10 ll 
1-0 75 . 8 242 12 13 
1-5 35 10 ll 112 14 16 
2-0 22 14 15 66 18 21 
2-5 | 16 86| (6 19 45 | 22 26 
3-0 11 20 24 30 26 31 
400 | > ji —| | 8 ab ae 
5-0 7 | _ 53 17 | -™ 62 
10-0 4 - | a lo | 197 
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Remarks on Table 3 


First of all it must be remarked that precise accuracy in a table of this kind is unnecessary. 
Most of the numbers in the region of 25 or less were found by direct calculation, but the 
rest’are approximate. Again there is a slight difference between finding the number of 
terms (s) of an expansion to ensure an error less than ¢ when the true value is known, as 
compared to relying on the expansion alone to give this information. From this point of 
view since the terms in (B) are in pairs greater than and less than the true value, the numbers 
given often give greater accuracy than is quoted. Thus four terms of (B) give accuracy in 
the ninth decimal place. 





birt 
Laplace C.F. — 
I 


| 
Series C-— ——— ae] 
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Fig. 1. Number of terms (s) for accuracy to be of the order 
10-(+) x 2-5 for the Laplace c.¥. and for series C. 


It will be seen that the number of terms for a required accuracy increases with decreasing 
t for the Laplace c.¥., the increase being rapid as ¢ approaches zero, while for (B) and (C) 
there is a steady increase with increasing ¢t. Again for (A) the ratio of the number of terms for 
12 places of decimals as compared to 6 is approximately three, whereas the corresponding 
factor for (B) and (C) is less and decreases as ¢ increases. Expressed otherwise, we may say 
that increasing accuracy for the Laplace o.F. for a given value of ¢ is only achieved at the 
expense of an increasing additional number of terms. To take an extreme case, we remark 
that for ¢ = 0-1, the additional numbers of terms required for successive jumps of six places 
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of decimals, starting from zero, are: 7072, 16390, 25934, 35477, 45020, 54564 approximately. 
For the series (C) the corresponding numbers for the case ¢ = 10 are: 187, 10, 10, 10, 9, 9. 

A graphical comparison of the expansions (A) and (C) showing the logarithm of the 
number of terms against the accuracy is given in Fig. 1.+ It is evident from this that the 
usefulness of the series (C) falls off slowly with increasing ¢, and from the flatness of the 
curves when 7 is large (for a fixed ¢) it is clear that although convergence may be slow at 
first, sooner or later it becomes rapid. By contrast the Laplace c.¥. falls off rapidly in its 
usefulness as ¢ decreases, and there is little tendency to flatness for increasing n when ¢ 
is fixed. 

To discuss the behaviour of (B) for moderate or large values of ¢ requires an asymptotic 
expression for the denominators of the convergents, and this is not at present available. 
From the numerical evidence of Table 3 it would appear that (B) may be quicker in con- 
verging than (C). 


5. SCHWARZIAN INEQUALITIES AND BIRNBAUM’S CONJECTURE 
Birnbaum (1942) gave the inequality R(t)>[—t+./(#+4)]/2 (¢>0), and later (1950) 
conjectured what amounts to R(t) < 4/[3¢+./(#+8)] for t>0. We prove generalizations of 
these, using the inequality of Schwarz, which in its simplest form states that if f,(z) and 
f(x) are linearly independent functions, then 


b b 
( fi(x) dx [ Ae sue dx 


v@ 


fhe) fledde [fede 


Hence taking f,(x) = (a —t)#* e~***, f,(~) = (w— t)#*+® e-4* in (26) using (21) we find 





| 
| >0. (26) 
| 


| 


F(R) =| Ba. Xs Kerr — Borers lo (s = 0,1,2,...; #20). (27) 
Xer1— Ro,,, Ros.2—Xs+2 | 
This leads to the two sets of inequalitiest 
R > [Vo, + (28)! ./(t? + 88 + 4)]/(20,) = Re, (28a) 
RB < 2M¥o944/[Voes1 + (28+ 1)! (#2 + 88 + 8)] = Rog, (28) 


sa —s = — = — v2 
where Ls — W,W549 Wg+15 Vs = Ws Xs+2 + Usi2Xs £0541Xs41> 1, = XsXs+2 Xs+1 


and x,/, is the sth convergent of the Laplace c.r. The sign of the radical in (28) is easily 
determined provided we can show that for > 0, (i) ~,.,>0 and (ii) 7,,,,>0. For if this is 
the case we see that since 


F(Ps)<9, FilGsr2)<9, Fiy(bor1)>9, (29) 
and also from (19) that R lies between ¢,,, and ¢,,,, then 
Pas2< Rog< R< Goo1, (30) 


+ I am indebted to Dr J. A. Storrow for drawing this diagram. 
27*-1g!(s—1)! 

t When t= 0, R,,(0) = (2s—1)! (2s +1) 

Wallis), providing a new demonstration of the value‘of the total area under the normal curve. A similar 

result follows from R,,,,(0). 


, 8o that in the limit R(0) = (47) (from the formula of 





of 
d 


of 


ir 
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where Rp, is the e larger root of F,,(R) = 0. Similarly, if (ii) holds, then by considering the 
sign of R Fi, (Ro 1) when Ro = $o511> Pos+2> Poss, We see that 


Posr2 <P < Rosi < Pos+s, (31) 
where R,,,, is the smaller root of F,,,,(R) = 0. 


To prove (i) we have the integral representation w, -/ g(x) (t—x)* dx, so that the result 


follows immediately from the inequality of Schwarz. For (ii) there is the recurrence relation 


1, = (s?—1) M2 + Xs—1(Xs41 — 8 +1 Xs-1), 

and if o, = x,—8X,_. then o,,, = (s—1)0,+#y,. But o, =t, o, =#@+t#, and so o,,>0 
since x,, > 0 for ¢> 0. Thus 7,,,, > 0 since 7, = 2. 

From (19), (30) and (31) it is evident that lim R, = R(t) (¢>0). Murty (1952) has given 

38> @ 

a partial proof of some of the inequalities in (28). Sampford (1953) has shown that R< R, 
holds for the extended range t> — 1. Similar extensions could be given for other cases also; 
for example, since the roots of F(R) = 0 are of opposite sign for 0>t> —2, we have 
R>[#®—2t+./(+ 12)]/(4+3) for t> —2. 

It is obvious from (30) that R,, increases from time to time with s; but it is not obvious 
that {R,,} is a monotonic increasing sequence. This can be proved by considering the 
recurrence relations F(y) = —sF_,(y)+ (yo,—x,)? (32) 
and Fy) = — Fly) + 8(8 — 1) Fe_oly) + (yy — Xs) (Ysa — Xs-a)- (33) 
For from (32) it is evident that F,(R,_,)>0, and using this in (33) we see that F,(R,_,) <0. 
It follows that altogether we have 

Fi(P2s)<9, Fya(Rog) = 9, Foo(Pos41)>9, Foe(Ro5-2) < 9, 
which in conjunction with (30) proves that R,, > R,,_». Similarly, we can prove R,,,3 < R,,,,. 
Hence we have the result, to be compared with (19), 
Ry <R,< Ry... <R<...<R3<R,<R, (t>0), (34) 
in which any R, is a better approximation than the Laplace ¢,,,. As an indication of the 
closeness of approximation of these Schwarzian sequences, we give the value of the error 


Table 4. Comparison of Schwarzian approximation to R(t) 























t | 0-0 0-1 0-5 | 1-0 | 2-0 3-0 

R | 1:25 1-16 0-88 * 0-66 | 0-42 0-30 
R, | 0-25 0-21 (1-11) p™ (0-48) 7" 1-38 (0-15) | 2-72 (1-21) 2-18 (2-46) 
R, | 0-16 | 0-12 (1-55) | 1-38 (0-50) | 1-11 (0-10) | 2-13 (2-74) 3-22 (2-10) 
R, | 1-99 | 1-70 (1-10) 1-19 (0-30) | 2-42 (1-56) | 332 (2-28) | 4-36 (3-24) 
R,; 1-80 1-53 (1-42) 1-11 (0-29) | 2-20 (1-37) | 4-95 (2-12) 5-74 (4-69) 
R, | 1-61 1-39 (0-95) 2:73 (0-20) | 2-10 (1-24) 4-53 (3-53) 5-18 (4-22) 
R,; 1-53 1-22 (1-34) 2-49 (0-19) | 3-57 (1-17) 4:13 (3-25) 6-49 (5-74) 

P.W. | 0:00 | 5-76 3-93 | 2-73 | 151 0-12 
| 





Figures in the body of the table represent | Approximation — True valve| with the same convention 
as in Table 2. Figures in parentheses in the same line as R,, (s=0, 1, ..., 5), refer to ¢,,,. P.W. refers 
to the Pélya-Williams inequality. 
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of R,, s = 0 to 5, for six values of t in Table 4. For comparison we also give the error for the 
Laplace convergent ¢,,,, and that for the Pélya (1945) and Williams (1946) inequality 


I. g(x) dx <4[{1—exp(—2#/m)}* (t>0). 
0 


Strictly speaking the latter is a different type of inequality, and although Pélya gives a 
generalization of it, convergency properties have not been considered. 


Remarks on Table 4 


The second row gives the value of R(t) to two decimal places merely to indicate the 
importance of the errors in the approximations. It will be observed that the Schwarzian 
values are always better than the corresponding Laplace convergents, and this is par- 
ticularly so for small values of ¢; thus for ¢ = 0-1 the error in R, is within about 4%, whereas 
the error in ¢, is in the region of 100%. It is also interesting to notice that the P.W. values 
are better than the other approximations for ¢< 1-0, but deteriorate for ¢> 1-0. Of course 
for t< 1-0 higher Schwarzian approximations would ultimately become better than the 
P.W. inequality. 


6. SUMMARY AND CONCLUSION 


We have given a new continued fraction (c.F.) for the normal integral ratio 


a t 
R(t) = a | e-t=* dx, 
0 
which turns out to be rapidly convergent for small values of ¢ (say less than ./3); for moderate 
values convergence is slow at first but becomes quite rapid in due course; at least this is 
supported by numerical evidence. The rate of convergence of this C.F. is compared with a 


series and also with the Laplace c.r. for the Mills’s ratio R(t) = e#” | e~*=* dx. An interesting 
t 


point about the Laplace c.¥. is that the difference between the sth and (s— 1)th convergents 
has the approximate magnitude 2 ,/(27) exp (4¢? — 2¢,/s), when ¢ is not large. This, together 
with numerical evidence, shows that fair accuracy is attainable with a few terms when ¢ is 
moderate to large, but the rate of convergence deteriorates rapidly as t becomes small. 
In addition, fora fixed value of ¢ (not large) the necessary number of terms for a given accuracy 
varies directly as the square of the accwracy. The behaviour of the new c.¥. for R(t) is com- 
pletely different, for it converges rapidly for small ¢ and deteriorates slowly as ¢ increases, 
whereas for a fixed value of ¢, increasing accuracy seems to demand much less in terms than 
the sequence of the accuracy. 

We also give two sets of inequalities for R(t), consisting of irrational fractions, these being 
generalizations of results by Birnbaum. We prove that one set increases, and the other 
decreases, monotonically, to the limit R(t). A particular case includes the formula for the 
total area under the normal curve. 

In an appendix we give some properties of the o.¥.’s for R(t) and R(t) which are not 
readily available in the literature, and a brief summary of expansions and inequalities. 
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APPENDIX 
(1) Properties of the Laplace continued fraction for R(t) 
a 
(a) (i) R(t) = x ( — )*8!/(@,@,,1)- 
s= 


(i) R(t) = t E (26)!/(oas sae). 
s= 


(iti) R(t) = 1f/o,—t Y (28 + 1)!/(Wo541 Wes+3)- 
s=0 


These follow from the general theory of ¢.7.’s. 
(d) (i) Neva = Ost Xe- 
(ii) Xe = Op + 0% 4 +0% 9+... 


Inductional methods may be used for (i), from which (ii) is deduced. A general theorem, of which (ii) 
is a particular case, using more elaborate methods is given by Sherman (1933). 
Ro,- s s+l 8s+2 
we AO ER eS (t>0). 
t+ t¢ + & +... 





(c) 
Xs-1— Ro,_; 


The rth convergent of the o.¥F. leads to the approximation ¢,.., to R(t). This result may be used as a com- 
putational check. 
(dz) Remainder formulae: 


atelt [° e-*get 


Ro.;— X23 = —— —~———~ d, 
Wes — X2 Jn }, (a+ 42) a 


28-l(g— ])! © o-2 78-3 
(3 a) 2 (s = 1,2,...; #>0). 


_, — Rw,,_, = —— —— dr 
X2s-1 Was-1 Ja 0 +a 


These follow from the general theory of c.F.’s. 


(e) Systematic summation of asymptotic series for R(t): 


SS dae ae OF beat: (28s—3) ¢ t 30? (28 — 3). 2? 
———+——-—...+(-1)§ — =- — — ——— 
t {28-1 2+2—142-3+4...42-(2s—1) 
(2) Properties of the new continued fraction for R(t) 
R(t) i li li (t>0) 
=—— “ es — = im ?r,= im P,/ . 
‘$46 s Se." co” hee 
(a) Differences of convergence: 
(i) Tes41— Tae = (— 2)*81(1.3..... (28 — 1)) t8***/(GosGas+1)- 
(ii) Tye— Papa = (— 2) (@—1)1(1.3.....(28— 1) 4 day-1 9as)- 
(ili)  Tep—Tge-2 = (— )*-1 (48 — 1) (28 — 2)! t4*-8/(Gg,_29o5)- 
(iv) T2941 —Tas—1 = (— )** (48 + 1) (28— 1)! 0-*/(Ga5-1 Fas+1): 


(6) Formulae for denominators: 


@) ge = | © gla) x(a? — 02) der, 


ao 
(ii) Ges+1 = | g(x) 24+? (a? — #*)* dx. 
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(c) Remainder formulae: 


*t 
(i) Rg.—Po» = | met- et exp }(0? — 2%) de. 


t 
(ii) Raess1— Pees = [ a*s+2(0? — °)* exp $(t? — x?) da. 
0 


(d) Formulae for computational checks: 


Rqss1— Parr _ (28+1)0 (28+2)#2 (28+3)2? 


siti Ry +3 —- 440+ &+7 —... 





n terms leading to the approximation r,,,,,, for R(t). 


Raa, — Pes od 2st? (2s+1)t2 (28+2)# 


ii = = 
Oe 4s+1— 484+3 + 484+5 —-... 





n terms leading to the approximation r,,,,, for R(t). 
(e) Partial sums of series for R(t): 
s—1 g2rti 


s a ¢ 3t? (2s — 3) ¢ 
See RD wcevefB:. 1). , 


—#4+3—-0#45—...—42s8—1° 





t 
1 


(3) Miscellaneous results 
(a) Factorial type of series for R(t) due to Schlémilch (1895) 
1 «o io 4) 
tR(t) = 14+— ¥(- 2y | x) e-2 -bdx](u, Ug... Uy), 
VT p=1 0 
where U,p=2+2r, aM =a(~—-1)...(e—r+1) (t>0). 


(This result is mentioned by Wishart (1927).) 
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In particular tR(t) = 1—-— +—_ —- ——__ + —_- - ——_——_ 
Uy UjzUg UyUgls UjUglgUy Uy UgUy Uy Us 
t 4 tan 2 jn 
(b) | g(x) da <a;42n)-| n | en-le-iz as | (t>0), 
0 0 
_(n 1j/n 
where tgs 2n-if I (5+ :) | in = 8. 3,...). 


Due to Pélya (1945), the particular case n = 2 being given by Williams (1946). (See also Tate (1953).) 


(c) Minimum property of the convergents of the Laplace c.F.: 


t «© l 2 
in —-— —&y-i(x t?) {| ———__ — dx = R(t)— + / +2 
_ al. e~*a-*x+ 4 leer nc} x (t) — Xos+2/Wos+2 
pe 2 © 1 ° , 
_ = [. e~*xi(x + $0") e774) dx = t[X25+3/H29+3 — R(t)], 


the minimum being taken over all polynomials of precise degree s. These are implied in a general theorem 
due to Stieltjes. (See, for example, Shohat & Tamarkin (1943), p. 75.) In a similar way we find the 
following inequalities: 
(i) (+1) R(t) —t> 2t/(t4 + 642 + 15), 

(ii) (2+ 1) R(t) —t> 2(t5 + 1409 + 75t)/(e® + 20¢® + 150¢4 + 4208? + 525), 

(iii) (2+ 3¢) R(t) <t? + 2—3!/(t* + 102? + 35), 

(iv) (8+ 3) R(t) <t? + 2—31(t4 + 1822+ 119)/(t8 + 2828 + 294¢4 + 126022 + 2205), 

(v) (t*+ 6¢2+ 3) R(t) —t9— 5t> 41 t/(t® + 15¢* + 105¢2 + 315), 

(vi) (t&+ 1028 + 15t) R(t) <t* + 94? + 8 — 5!1/(t® + 21¢4 + 1892? + 693), 
where ¢>0 throughout. 





EE 
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(d) The weight function in probit analysis. 


t co 
If y(t) = aa | eae { e-i*" dx, 
—-2 t 


then 7(t) is a decreasing function of ¢? (Hammersley, 1950; Tate, 1953). 
(e) Rough formula for the normal distribution function: 


«oo 
—10log [ g(x) dx = 24t2+4+10log¢ with an error less than 1 if 2<t<14 
t 
(Good, 1950). 
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A CONFIDENCE REGION FOR THE SOLUTION OF A SET OF 
SIMULTANEOUS EQUATIONS WITH AN APPLICATION TO 
EXPERIMENTAL DESIGN* 


By G. E. P. BOX, Imperial Chemical Industries, Blackley, Manchester, England, 
and The Institute of Statistics, Raleigh, N.C. 


AND J. 8. HUNTER, The Institute of Statistics, Raleigh, N.C. 


1. INTRODUCTION 


The problem of finding limits of error for the solutions of a set of k linear equations obtained 
by equating each of the quantities 


Ayo + Gy) %1 + AyygXgt+... + Ay, %, = 3, 


Dag + Bq %y + Agg%qt ... + May %, = bg (1) 
Aygg + yy Ly + Ayy%o t+ ... + AypX, = b, 


to zero, where 49, 41;, ..., 44, ---, Aj, are Subject to error, was considered by Lonseth (1942) 
who gives a number of references to earlier writers. He obtained a series for the error of 
any unknown, considered as a function of the k(k+1) errors in the coefficients. He also 
developed a criterion (which depended on the conditioning of the equations and the small- 
ness of the errors relative to their coefficients) for the convergence of his series. Lonseth 
and more recent writers such as Hotelling (1943) and Turing (1948) have been chiefly 
concerned with the effect of rounding errors. Not infrequently, however, we are faced with 
the problem where large observational errors occur in the coefficients and the equations 
may not be well conditioned. It then seems essential to consider the errors in the solution 
not individually but jointly, in fact to determine a confidence region for the possible solution 
in the space of 2,,2%,,...,%,,. One example of this circumstance which has attracted the 
attention of the authors to the problem occurs when attempts are made to attach limits of 
error to the position of a maximum in experiments of the type discussed by Box & Wilson 
(1951). Further reference will be made to this problem later. 


2. AN EXACT CONFIDENCE REGION 


We assume that the errors in the coefficients a9, @,, ...,@,;, are distributed multinormally 
with a k(k +1) by k(k +1) variance-covariance matrix Qc”, known apart from the factor o?, 
and that an estimate s? of o? is available based on ¢ degrees of freedom and distributed as 
x*(¢)/¢ independently of the errors in the coefficients. We use the notation 2(¢) to denote 
a quantity distributed in the x? distribution with ¢ degrees of freedom; similarly, F(¢,, ¢.) 
denotes a quantity distributed in the Fisher-Snedecor F distribution with ¢, and ¢, degrees 
of freedom. A confidence region may now be found by an extension of an argument given 
by Fieller (1940). 

Consider the expressions 6,, 63, ..., 5, of equation (1) for a fixed set of values 29, 28, ..., 22. 
Knowing 82 we may readily calculate the variance-covariance matrix &(88’) = Vo*, where 


* Work sponsored by The Office of Ordnance Research, U.S. Army. 
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'Y-1 


ke : is distributed as F(k, ¢), the 


§ denotes the kx 1 column vector of the 6’s. Since 
probability is 1 — a that the inequality 

8’V—18 

a < F(k, $) (2) 








is true, where Ff, denotes the a probability point of the F distribution. 
Now consider the space of the k(k+2)+1 quantities a9, a4, ..., Ay, 8%, %y, Ug, ..+, Lye 
The expression §/V-18 /ies? 


is a function of these quantities, and equation (2) defines a region R(x, 2, ...,22) on the 
hyperplane x, = 29, 7. = 28, ..., X,,=2? such that the probability that the point a9, @1;, ..., Az, 
8®, 29, x9, ..., 2% falls within this region is 1— a. Now if we combine such regions for all values 
of x, Xa, ..., H, we obtain a region R in the whole k(k + 2) + 1-dimensional space which is 
such that the chance that a point a9, @1;, ..., yx, 8", %, La, ..., %, falls within Ris 1—a. It 
follows that, given a set of observed values a, a9,, ...,a%;, (8°)?, the region within R on the 
hyperplane a,)=4a%, @,,=@9,, ..., @,=a%,, 8? =(s°)? supplies the 1—a@ confidence region 
in the usual sense for 2,,%,...,2,. A valuable discussion of the confidence argument 
pertinent to this problem has been given by Bose & Roy (1953). 
The limits of the 1 —« confidence region are given therefore by 


8’'V-18 = s*kF,(k, d). (3) 
Now the original quadratic form may be written as a ratio of two determinants 
| 0: 8 | 
Beh, it tec 
;6:V 
$'V-4§, = —_ —. (4) 
|v | 


Consequently, in general the boundary of the confidence region is defined by those values 
of %,, 22, ...,%, which cause the equation 


2“F, | 8’ 
heliealehans 1 
to be satisfied. If o? is known, s*kF, will be replaced by x20? in (5). 
In the important special case in which the estimates a9, @,,, ...,@,, are uncorrelated (3) 
gives for the boundary of the confidence region 
k k (/ ke \2) k 
E (ep/oeo} = % |(¥ ays) | 3 v(ay)ajl = sthk, (8) 
i=1 i=1 | \y=0 } | j=0 


where v(6;) o? is the variance of 6,, v(a,;) 0? is the variance of a,; and x,=1. If, in addition, 
the variances for all the coefficients are equal (v(a;;) = v(a); 1,7 =1, 2, ...,k), the boundary 
of the confidence region is a quadric surface given by the equation 


k k 2 k 
Se= 5 (= a,s2;) = ¥ 2fv(a) s*kF,(k, $). (7) 
i=1 \j=0 j=0 


3. CONDITIONING OF THE EQUATIONS 
A circumstance which profoundly affects the nature of the confidence region is whether 
the equations are well or poorly ‘conditioned’. We first explain what we mean by this 
term. 
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In an obvious matrix notation (1) becomes 


Consider the quadric defined by 

8’6 = (Ax+a,)’ (Ax+a,)). (9) 

k 
For any fixed value of z 86 = > =z (10) 

i=1 
defines a surface which we call the conditioning surface in the + 1-dimensional space of 
Z, Ly,Xq,..-, 2%. If |A| +0, in the k space of x,, 79, ..., 2, the contours of (10) for fixed values 
of z are ellipsoids. Referring these contours to their centre x, as the origin and using their 
principal axes as new co-ordinates X,, X,, ..., X;, equation (10) may be written 
k k 

> = > A,X} =z, (11) 

i=1 i=1 


where X; = uj(X—Xp),U,; (i = 1, 2, ...,k) isa latent vector of A’A, and A, is the corresponding 
latent root (essentially positive). Clearly, if one or more of the latent roots are small com- 
pared with the remainder, the conditioning surface is attenuated in the direction of these 
axes. Thus, suppose A,,A,,...,A, are small compared with the remainder, then values of 
X1,Xq,..., x, differing greatly from the correct solution but corresponding to points near the 


k 
hyperplane h, passing through the axes of X,, X,,..., X,, produce only small values of ¥ 6? 
i=1 


and therefore only small values of 6,, 6,, ..., 6, in (1). In this case, therefore, a wide variety of 
‘nearly correct’ solutions of the equations a, + Ax = 0 exist leading to difficulties in the 
numerical solution, and the equations are said to be ‘poorly conditioned’. An interesting 
example of such equations is given by Morris (1946). In the limit when A,, Ag, ...,A, are 
exactly zero, any point in the hyperplane h, satisfies the equations, A being now of rank k—r, 
and this is the well-known case where r of the unknowns (that is, r co-ordinates, of a solution 
point) may be assigned arbitrarily. 
Turing (1948) proposed as a measure of ill conditioning the quantity 


c = k-1N(A) N(A-}), (12) 
where N(A) = (trace A’A)* = (> a?,)* = (+ A,)* is the norm of the matrix A. The criterion 
ij 4 
is in line with the discussion above, for we see that 


c= IEA DAT (13) 


is a homogeneous function of degree zero in the A’s and is thus dependent only on their 
relative magnitudes. It takes the values of unity if all the A’s are equal (so that A is ortho- 
gonal), and is greater than unity otherwise. Its value will be large if any of the A’s are small 
compared with the others. 


4, EXAMPLES OF CONFIDENCE REGIONS 


The confidence region for the solution of a set of linear equations would be expected to 
depend on (i) the magnitude of the errors in the coefficients and (2) the state of the con- 
ditioning of the equations. The separate contribution of these two influences can be seen 
particularly readily when the coefficients have equal variance and are uncorrelated, so 
that equation (7) may be used to obtain the confidence region. 
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For example, consider the pair of equations 
32,—54,—1 = 4, (14) 
32,+32,—9 = d,, (15) 


when 6, = 6, = 0. These have the solution x, = 2, x, = 1 indicated by the cross in Fig. 1a. 
If the coefficients are subject to error, each with a known standard deviation equal to one, 
and they are uncorrelated, then the 95 % confidence region obtained by setting 


v(a)s*kF(k,d) equalto 1x x25;(2) = 5-99 


in equation (7) is shown by the bold line in Fig. la. We note from equation (7) that the 
boundary of the confidence region is the intersection of the surface 


62+d2 =z (16) 
with the surface (1+22 + 2) v(a) 07x? =z. (17) 


Contours of these two surfaces for z = 25 and z = 120 are shown by the dotted lines in the 
figure and intersect on the confidence region. In separating equation (7) into the two parts 
we separate the two features which decide the confidence region. Equation (17) defines a 
saucer-like surface represented by the circular contours. The steepness of the surface depends 
on v(a) o, the variance of the coefficients. Equation (16), on the other hand, defines the con- 
ditioning surface as an elliptical valley whose contours are concentric ellipses and whose 
attenuation is a measure of the conditioning of the equations. If v(a) is small then the surface 
described by (17) will be shallow and will cut (16) to form a small region whose shape is 
very like that of the contours of the conditioning surface. If v(a) is large, however,-the 
surface (17) will rise more steeply and the confidence region will be larger and its shape 
will, to some extent, be distorted away from the shape of the contours of (16). In particular, 
the region will tend to extend farther from the solution point on the side remote from the 
origin. 

Equations (14) and (15) are a well conditioned pair with A, = 16, A, = 36, c = 1-08. 
A somewhat less conditioned pair is obtained if one-half of equation (14) is added to equation 
(15) to give a new equation to replace (14) so that we have 


4-5ar, + 0-5ary—9-5 = 0, (18) 
3-02, + 3-Oa,—9-0 = 0, (19) 


having the same solutions as before, but now A, = 4:2, A, = 34:4, c = 1-55. Then if we again 
assume that the coefficients have standard deviations c = 1 and are uncorrelated we 
obtain the open confidence interval shown by the bold lines in Fig. 10. 

The great difference in the shape of the confidence regions found with the two different 
pairs of equations is seen to be due to differences in conditivning as typified by the con- 
ditioning surfaces. Although the second set of equations is not seriously ill-conditioned, its 
conditioning surface is considerably more attenuated than that of the first set. Thus, when 
the surface corresponding to (17) is cut by that of (16) the open ended hyperbolic confidence 
region shown in Fig. 16 is obtained. 

The effect of ill-conditioning is, in general, to cause the confidence region to spread out in 
the direction of the axes of the conditioning surface corresponding to the small latent roots 
of A’A. 


Biometrika 41 ” 
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Fig. la. 95% confidence region for the solution of the linear equations 


3z,—52,-1=0 and 32,+327,—9=0. 











Fig. 1b. 95% confidence region for the solution of the linear equations 


4-52, +0-5¢,-9'5=0 and 3-02,-+3-02,—9-0 = 0. 
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5. THE CONFIDENCE REGION FOR THE STATIONARY 
POINT ON A FITTED SECOND-DEGREE SURFACE 

In a paper concerning experimental methods for attaining optimum conditions, Box & 
Wilson (1951) discussed the fitting by least squares of a polynomial equation relating an 
observed response to the levels of & independent variables, 2,, x2, ...,,. Suggested experi- 
mental designs were discussed from the point of view of minimizing both the random errors 
in the estimates of the coefficients, and of the biases contributed by possible higher order 
terms which had been ignored in the response function. 

Suppose that a polynomial equation of the second degree is fitted and for simplicity assume 
that there are only two independent variables x, and z,. Ifthe equation of the fitted response 


surface is Y = dq + Qyo%y + Ago Xe + $0417} + $gg%5 + AyQ% Xp, (20) 
the centre of the fitted system is at a point corresponding to the solution of the equations 
oY 
Be, ~ Sat Man + ate = 6, = 0, (21) 
0 
Ox, = Ago + Ay_X1 + Agg%a = b, = 0, (22) 


which are of the same form as equations (1) and (8) in the special case in which the matrix 
Ais symmetric. Furthermore, the variance-covariance matrix of the least-squares estimates 
which corresponds to 2? is known apart from o? and is dependent only on the experimental 
design used, and an estimate s* of o? is usually available either from the residual sum of 
squares (assuming an adequate model) or from some independent estimate obtained by 
replication. Thus to obtain a confidence region for the centre of the system we can im- 
mediately apply equation (5). 

It should be remembered, of course, that in practice an important source of error not 
taken account of in the above confidence region arises due to the possible lack of fit of the 
second degree equation. Such lack of fit introduces errors not only directly in the sense that 
there is no second-degree equation which can adequately represent the surface, but also 
because the omission of higher order terms necessary to give an exact fit may cause the least- 
squares estimates of the second-degree equation to be biased. However, provided this 
limitation is borne in mind it is instructive to consider the size and type of confidence region 
that arises due to sampling errors in the coefficients alone. 

We may first note the relationship that exists between the fitted response surface and the 
conditioning surface discussed in §3. The equation of the fitted surface written in matrix 
notation is Y—dep a x’a,+ }x’ Ax, (23) 
and that of the conditioning surface is given by equation (9), where A is necessarily sym- 
metric. On differentiation we see that if A is non-singular both systems have the same centre. 
Furthermore, the latent vectors of A’A = A? are the same as those of A and the latent 
roots are the squares of those of A. 

Thus the centre and the principal axes of the conditioning surface are the same as those 
of the response surface, but attenuation of the response surface is accompanied by even 
greater attenuation of the conditionir surface and the contours of the conditioning surface 
are always ellipses. We see, therefore, that to the extent to which the confidence region 
reflects the influence of the conditioning surface, it will (to some lesser extent) reflect the 
characteristics of the response surface itself. 


13-2 
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It should be remembered that the method described provides a confidence region for 
a stationary point, which has as its co-ordinates the solution to the equations (21) and (22). 
If the coefficients in (21) and (22) are such as to indicate that the fitted surface possesses 
a maximum, and the confidence region is closed, we may assert that this is a confidence 
region for a true maximum (that is to say, not merely a confidence region for a stationary 
point). This is so because, assuming that the surface can be represented by an equation of 
second degree, if the plausible range of variation in the a’s includes surfaces in which one 
or more of the A’s could change in sign, then points lying along the axes corresponding to 
these roots would be acceptable as solutions of equations (21) and (22), and open contours 
would necessarily result. 


6. EXAMPLE OF A CONFIDENCE REGION OBTAINED USING AN EXPERIMENTAL DESIGN 


To give some appreciation of the type of confidence regions for an experimentally deter- 
mined stationary point met in practice, a statistical experiment was carried out as follows. 
It was assumed that the true equation of a surface was 
0 = 78°373 + 4-533a, — 1-867x, — 3-333x3 — 3-33323 + 4-0002, a, (24) 
and that a 3? factorial experiment was conducted at the levels — 1, 0, 1 for the independent 
variables x, and x,. Random normal deviates with o = 1 were added to the values of 7 at 
the nine points in the design. The resulting values are recorded in Fig. 2. The second-degree 
equation fitted by least squares to these points was 
Y = 77-82 + 3-702, — 1-62x, — 28323 — 2-5822 + 4-02a, a, (25) 
whose contours are indicated by the dotted lines in Fig. 2 and whose centre, x? = 0-96, 
x} = 0-44, is indicated by a cross. The residual sum of squares, based on 3 degrees of freedom 
was 7:20, giving an estimate of s? = 2-40, assuming the model is adequate. The variances 
and covariances for the coefficients of (25) estimated from a 3? factorial are 
V(ao9) = $9, V(ayo) = V(ao9) = $07, V(ayy) = V(aqq) = 207, V(ay2) = fo” 
COV (Aq, 41) = COV (Aq, 42) = — 50". 
Substituting the values of s? = 2-40, and the value of Fy; (2,3) = 9°55 in equation (5), we 
obtain for the 95 % confidence region the shaded portion indicated in Fig. 2. 

In practice, having fitted the second-degree surface the experimenter would perform 
confirmatory experiments. A natural location for additional experiments would be along 
the principal axes of the fitted conic. We have assumed that six further points were added 
as shown in Fig. 3. The values which might have been obtained at these points, indicated 
in Fig. 3, were secured by calculating the values at the points using equation (24) and adding 
random normal deviates as before. Using all 15 points the equation was refitted giving 


> 


Y = 77-95 + 3-762, — 1-57a,— 2-872? — 2-6423 + 3-842, 2». (26) 
The symmetric variance-covariance matrix for the new estimates a,; is indicated below: 
00 10 20 1] 22 12 


r0-18684 —0-00296  0-00921 -—0-10282 -—0-23162  0-08576] 
0-13091 —0-01361 -—0-12340  0-05762 —0-01263 
0-13574  0-01184  0-08260 —0-02844 

0-43848 —0-15168 —0-09496 

0-95532 —0-18868 
4 0-14688 | 








irs 


we 
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The residual sum of squares was 9-60, and hence the residual error variance s? was 1-07 ; 


now based on 9 degrees of freedom. The centre of the newly fitted equation is at the point 
a, = 0-89, x} = 0-37 indicated by the cross in Fig. 3. 








Fig. 2. 95% confidence region (shaded) for an estimated stati mary point 
based on nine determinations from a 3* factorial design. 


The confidence region would now be expected to be considerably smaller 
(i) because of the influence of the extra points, 
(ii) because of the larger number of degrees of freedom upon which F is based (the 
critical value of F is reduced from 9-55 to 4-26), . 


' (ili) the first estimate of o? (s? = 2-4) happens to be considerably greater than the second 
estimate (s? = 1-07). 
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The new confidence region, shown as the shaded area in Fig. 3, is closed but attenuated. 

The size and attenuation of the regions found emphasize the considerable uncertainty 
that would sometimes exist in the location of a maximum, even when the profound effect 
of lack of fit of the second-degree equation was ignored. 


@ 61-9 











Fig. 3. 95% confidence region (shaded) for an estimated stationary point 
after the addition of six supplementary determinations. 


CONCLUSION 


In this paper the discussion has been confined to obtaining confidence regions for the 
solution of sets of linear equations. However, it is worth noting that this general method 
may be used for any set of simultaneous equations which are linear in the coefficients. Thus, 
in principle, we could use the method to find a confidence interval for the solution of the 
k equations P 
Aso t 2 is Sy(*r May weep Hy) =O (6 = 1,2,...,h). (28) 


Consequently, confidence intervals could be obtained for a stationary point on a surface 
represented by any equation linear in the coefficients, and not only for the quadric surface 
we have discussed. 
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ON THE MOMENTS OF ORDER STATISTICS IN SAMPLES 
FROM NORMAL POPULATIONS 


By H. RUBEN 
Agricultural Research Council Fellow 


The purpose of this paper is to show the geometrical significance of the moments of order statistics 
derived from normal populations. It appears that these, as well as the moment-generating function 
of the square of any order statistic, are intimately related to the contents of the members of a class 
of hyperspherical simplices. Further geometrical interpretations, together with ithe extension of the 
present results to bivariate moments and other properties, will be provided in subsequent publications. 

The problem of order statistics in normal populations has been extensively considered in the 
literature. For example,t Tippett (1925) gives the second, third and fourth moments of the extreme 
order statistics for a few sample sizes. Hojo (1931) examines the sampling variation of the median, 
quartiles and interquartile distance in samples from normal populations and computes a large number 
of integrals for this purpose. Cole (1951) produces a very simple recurrence relationship between the 
‘normalized’ moments which enables the normalized moments for all samples of size no greater than n 
to be obtained, by successive differencing, from the ‘normalized’ moments of the extreme order 
statistics in samples of size m<n.} Hastings, Mosteller, Tukey & Winsor (1947) give, among other 
results, the means, variances, covariances and correlations of order statistics in samples of ten or 
less from a normal population, some of the results being given to only two decimal places because 
of the extreme labour of the computation. Jones (1948) and Godwin (1949a,b) have obtained exact 
values for some of the lower moments. In particular, Godwin (1949a, p. 283) proceeds in a more 
systematic manner, and approaches most closely the essential idea upon which this paper is based. 

In general, it may be said that many statisticians have attacked the problem in a rather disjointed 
and fragmentary manner, usually from the small sample end, but have failed to develop a systematic 
attack which shall at the same time throw light on the interconnexion between the moments and enable 
the computation of the moments to become an economic proposition. It is believed that these con- 
ditions are met by this paper. 


1. THE REDUCTION OF A CERTAIN GENERALIZED CLASS OF INTEGRALS 


We shall here consider the properties of a class of integrals as defined in equation (1). This 
equation gives the general form of integral§ needed in the derivation of the series (see 
equations (65) and (66)) for the moments of the order statistics. In this section we shall 
show how any member of the class of integrals can be expressed as the series (23) depending 
on functions ¢(0; a; £,y) and containing coefficients whose values are listed at the end 
of the section. 





Let then d(s; «; By) =[" wexpiarde, (1) 
where Z= aa e~t*, (2) 
P= Jam) _.<" (3) 


and s, # and y are non-negative integers, while a is real and positive. 


+ The references given are not intended to provide an exhaustive bibliography. 

} The author has found that the cumulative error induced by the differencing increases too rapidly 
for the moments of the order statistics, except for those which are only a few places removed from the 
extreme, to be computed with a reasonable degrée of accuracy. 

§ Such an integral, with s = 0 and a +1, arises also in connexion with the moment generating func- 
tion of the square of the general order statistic in samples from normal populations (see equation (51)). 
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On integration by parts, 


p(s; a; B,y) = -{" 2 ee-tptgr S de 


= [Pee ete iphen)de 
= (s— 1)| x-22¢p hoy dx —(a—1) 4 atz*phqy da 


+ al etiph-tgrda—y { g-lzttiphgy-ldz (s = 1,2,...). 
—o —@o 


—1 
Hence H(e; a; f,y) = *—* ple—2; a; B,y) +4 ple—1; @+1; B—1,7) 
1 $(s—1; +1; 8, y-1) (s = I, 2,...). (5) 
Let A denote the operator which increases « by 1 and B andT the operators which decrease 


f and y, respectively, by 1. It is useful also to introduce the associated operators P, H, K 
and L defined by 


= a (6) 
H=AB, (7) 
K=yT, (8) 
L=H-K. (9) 


Certain rather obvious properties of these operators, as well as of functions of the operators, 
which will be used in the sequel should be noted in passing. Any two operators which affect, 
and therefore bind, different and disjoint sets of arguments commute with each other. Thus 
B commutes with lr, H™ with K” (m,n = 0,1, 2,...), P with L, P™ with L”, etc. We state 
some additional properties of the operators. First, 


1 
Pe = a” (m = 0,1, 2,...), (10) 
H™ = sB™ 

P (m = 0,1, 2,...), (11) 
K" = mpm 


where u—™ and u™ denote the ascending and descending factorials, respectively, of degree 
m in u, i.e. 


“eO=]1 
} (12) 
u-™ = u(ut+1)...(u+m—1) (m=1,2,...), 


yw =1 
} (13) 
u™ =u(u—1)...(u—m+1) (m=1,2,...), 


and A°, B°, ©, etc., are defined as identity operators. The validity of the identities in (10) 
and (11) follows by induction. For 
1 1 1 1 


—_—— a ak < +1 
P am) A® = a (a+1)-<™ = amt) A™ 
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and since (10) is true for m = 0, it is identically true for all non-negative integral m. 
wise (11) is valid for all such values of m, since 


Hf™B” on Bip i 1)™ Bt — ford Bm 1 


From (10) and (11) it follows directly that P, H and K are governed by the index law: 


P@pr = Prtn=Pprpr (m,n = 0, ¥. 2, ey 


H”“H” = H+ = H“H™ 


(m; n= 0, 1;:25.5:). 
K”K* = Kn =K"K™ 


Secondly, 


L” 


> (—) ("") KiH"-i = y (—) ("") ygm-DLIT™-i (m = 0,1,2,...), 
\ j= \ / 


j=0 0 


Like- 


(16) 


i.e. L™ may be obtained by the formal bionomial expansion of (H —K)”. The relationship 
(16) is again proved by induction. The proof will not be given here, being identical in form 
with the text-book proof of the binomial expansion for positive integral indices, when the 


elements are constants and not operators. Again 


L°L"=L™"=SL"L" (m,n = 0,1,2,...). 


For LL” = = (—)! ("") KH 5 (- yi (") KiH"-i 
= = E- ad (“’) (‘) Ki+iH{™+n-i-j 


-"5" = (-¥ (7) ({) ee 


r=0 i+j=r 


= yr ( - y (7 iid KTH™+” an 
r=0 
Similarly, L“L” = ‘s (.¥ y 4 K’H™+n—. 
r=) 


and hence (17) is proved. 
Finally, L and the operation of integration commute. Specifically, 


re re 
L| £0’; B,y)d0'=| (Lf; A, y)d0’. 
+ % ¥ 4 
6 
For, let F(0; B,y) = [ S(O; Boy) de’. 
re 
Then L | {(0': B.y)d0’ = (BB—7T) F(O; 8,7) 
» 6 
= PF(9; B—1,y)—yF(9; B,y—1) 
re 


ma (Lf(6"; B,y)\d6". 


(17) 


(18) 


— 


(18) 


— oe 
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We note from (16) that if w is independent of # and y, then 


= m 
L™ =o ¥ (-}¥ ( *) rope, (19) 

j=0 J 
and that if # and y are non-negative integers with m > £ +y, m being integral, then L"=0: 
L™=0 (m=f£+y4+1,8+y+2,...;8, y = 0,1,2,...). (20) 

Also, from (18), 
@ 6 

Lm {0's Bry)db'= | (Lnfl0"; B,yNd6" (m= 0,1,2,...). (21) 


Reverting to equation (5), this may be written symbolically in the form 
s—1 
(8; a; B,y) = —— $(s—2; a; B,y)+PLd(s—1; 0; fy) (¢=1,2,...). (22) 


The fundamental reduction formula (22) will be used to reduce ¢(s; a; 8,y), when s is a 
positive integer, to a linear combination of ¢-functions with zero as the first argument. In 
fact we shall show by induction that the solution of the recurrence equation (22) is expressed 
by k 
P(2k+1; a; By) = p> gx +1, 2141(%)(PL)*# 9(0; a; B,y), 
at (k = 0,1,2,...); (28) 
P(2k; a; B,y) = = ax, oi(%)(PL)* £(0; a; £, 7). 


provided the a’s are appropriately chosen. (The value & = 0 is included for the sake of 
completeness. ) 

Let it be assumed that (23) is true for some non-negative integer k. On using this assump- 
tion together with (22), 


2k +1 
$(2k+2; a; By) = = 





f(2k; a; B, y) + PLd(2k +1; «; £,y) 


_ +1 s 


= DX Ax, 94(%) (PL)** G0; a; 2, y) 
i=0 





k 
+ PL Pa ox +1, 2¢41(%) (PL)**+14(0; a; By) 


_ 2k+1 & 
in a t= 





ax, a(t) (PL)* 6(0; a; 2,7) 


k . 
= Pa Aoy+1, 2¢41(% + 1) (PL)*+*¢(0; a; B, y) 
i= 
k+1 


= 2 Aai-+2, ae(2) (PL)** 6(0; «; 2,), 


i= 





provided 
; 2k+1 : 
Aen+2, o4(%) = a om ai(%) +Gopsy o¢4s(@+1) (¢ =1,2,...,k), (24) 
2k+1 
Aex.+2,0(%) = Bex, o(%); : (25) 





Fox+2, 2%42(%) = Uop+1 o¢+1(% + 1). (26) 
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Similarly, and using the result just established, 

2k+2 
a 





$(2k+3; a; B,y) = $(2k+1; a; B,y) + PLO(2k+ 2; a; ,y) 


2k+2 * ; 
= SEE & anes, nisl) (PL) (0; 2; 2,7) 
k+1 ’ 
+PL z ox+2, 21(%) (PL) A(0; a; 2, y) 
{= 
2k+2 & a 
= a Dox +1, 21+1(2) (PL)**+1 (0; a; £,y) 


a i=0 





k+1 
+ x ox40, 04(% + 1) (PL)**+? 4(0; a; £,) 
k+1 


= 2, x49, 2141(%) (PL)**+! 6(0; a; 2, y), 


provided 
2k+2 ; 
Dox+3, 2¢-41(%) = ——— gps of41(%) +Oopi20(%+1) (2 = 0,1,2,...,4), (27) 


Fox+3, 2n+3(%) = Gox+e ox+2(%+ 1). (28) 


It follows that if (23) is valid for some non-negative integer k, it is valid also for k + 1, pro- 
vided the a’s satisfy equations (24)-(28). Consequently (23) is valid for all non-negative 
integral k, being obviously true for k = 0 with a, o(«) = 1 and a, ,(a) = 1. 

The equations (24)-(28) may be solved for the a’s quite systematically. In particular, 
by successive application of (25), 


Boxse o(%) = 1.3.5.....(2Kh+1)/a**!. (29) 

Again, by successive application of (26) and (28), 
Gon ss, on43(%) = Wopse onye(% +1) = ... = dg o(%+2k+3) = 1. (30) 
Similarly, Dox+2, x+2(%) = 1. (30’) 


Setting i = k in (24) and using (27), (30) and (30’), 


2k+1 
Aen+2, o4(%) = ae Dox, 24,(%) + Oop 41, o¢—1(% + 1) 
2k+1 
= —— +O 9p41 o¢-1(% + 1) 
a 
2k+1 2k 
we tA 2k-2(% + 2) = 
2k+1 2k 2 1 
= —— +——__ +... +$ —____. + ——_.. 31 
a veni’ ve4St-1 alte (31) 
Similarly, setting i = k in (27) and using (30) and (31), 
2k+2 2k+1 2 ] 
df ener cece Ub Alas de Seale 31’ 
Gax+3,24+1(%) a a+] +o 2k at 2ksl (F) 


The equations (29), (30), (30’), (31) and (31’) give the forms which the a’s take in particular 
cases. In general, however, the structure of the coefficients is more complicated, but they 
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can be determined systematically by successive application of (24) and (27). The values 
of the coefficients relating to the various values of s up to and including s = 10 are given 








below: 
a= 0; 
Qo o(&%) = 1 
=z] 
a, 3(%) = 1 
g8=2: 
1 
a, o(%) =~ 
7) A, (%) = 1 
0- (4) = ot oF 
ve 
dz 3(%) = 1 
ir, 
&8= 4: 
9) A, o(%) = 1 
4 aya 2 . 1 
ma a 6 @+1  at+2 
0) 
a, (a) = 1 
vy at 
@= 5: 
rae 7. 1 )+ 3 
45,10) = oat ae) * @+1 
Z Sit ss 2 + l 
a" a G+) 243 4+8 
as 5(«) = 1 
f s= 6: 
; 15 
1) Ag y(X) = a 
a a) = ©(2 ee eae 3 
6%) = Na tatl at+2) atll\a+l aye Tayo 
1’) 
ae yt oo Blige ps8 
bi So’ a atl a4+8 at8 a+4 


ey Ae, (%) = 1 
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a,,(@) = 5 (; ert) Hast) aces, 


6/4 3 2 1 5 {3 2 l 
Ay, s(x) = — (—+—— + tos) tenga + 








ala at+la+2 a+3 a+l1 a+2 a+3 


* 4 2 . 1 . 3 
asia mre («+ 3)? 
6 5 4 3 2 1 








7) = ot a4l tate at8 atd ate 
Ay x(x) = 1 
&= 
105 
As, o() 7 
ey! AES he ie Wags 
— Rohs. a+2)  a+l\a+1 a+2) (a+2)? 
a (= 1 " 3 m 15 
miles ony +a42 (a+2)?/  (a+2)8 
ss Sey 2 zis) (= 3 2 =) 
Se’ ata atl at+2 a+3 a+4) at+l\a+l a+2. a+3 a+4, 
a 5 (5 + eS )+ 4 2 ‘“ 1 . 3 
"a+2\a+2°a4+3'at+4) “a4+3\a+3 a+4 (a+ 4)? 
A, g(a) = —+ - : +—— : +— = + ~ 
oe a atl a+2 at+3 at+4 at+5 at6 
Ag 3(%) = 1 
&= 
a, ,(a) 8 “(5 (6+; ——) + 3 + 15 3) + 105 
a Na ala“ atl)” (a+1)?) (+133) * (a +1) 
tof) == (5 (e+ + +) States 
Oe Na le “ a+2 a+3/) a+l\a+1l a+2 a4+3, 





2 zh, 3 ) 
a+2 a+3)' (+3) 


9 


% 13 4 ( S 2c8 )+ 3 

a\ee3 a+2'a+3) 'at2\a+2 at (a +3)? 
3 2 l 3 15 

; a 

a+2 \a+2" a+3) " (a+3)?) ° («+3 


dipny as (04 Fg 8 np Binge ga) 
oO ala l +2° a4+3 a+4 a+5 





a 
huang Se gt fete Fe + Slastestaatasal 
at+l\a+1l ' a+2°a4+3 a+4' a+5)  a+2\a4+2' a+3' at+4 at+5 
2 





1 ahpighit ta 3 
oe ee ee as 
ei tes seatass) tasalasatass) tase 
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| s=9: 


(ayut Sig Hae ig Puget , 3, 8 orks 
MN) atl at. at3 at4 atd'a+6' at 


Ay g(%) = 1 
e= 10: 
2x0, 0(%) si 
9 (7 (5 (3 2 1 4 2 1 3 
oust) = aa (g atasitars) tani eri tass) tee) 











a 
6/4 & . Lys 3 )+ 15 
tasl lari \a+1 *a+2) t@+25) * e425 
«ae wees Se oe 
+l\a+1l a+2 rn (a+2)8) (a+2)4 


ARE +6 8 Sve ees ee 
~ ahala atl’ a+2° a4+3°a4+4) a+l\la+1 at+2'a+3 a+4 
5 








f 3 > 2 Fr 1 . 4 2 F 1 % 3 
a+2\a+2 a+3 a+4) a+3\a+3 a (a+ 4)? 
(ao, e 8 
a+l\a+1 a+2 a+3 a+4 
set er )+ 4 ( 2 ¥ 1 )+ 3 
a+2°a+3' a+4)'at+3\a+3 a+4) (a+4)* 


Deo beteaed ee foe PinpioRodipe! 
Fas2lar2\a+3*a43'at4) 'a+3\a+3'at+4)' +4 
4 








4 me. (a+z)+ a) + 15 
a2+3\a+3\a+3 a+4) (a+4)?/ (a+4) 

. -2( 6, Sc \ 6 OYE 0) 1 

Ay0,6(%) = tor] +1 a+2 a+3 at+4 at+5 a+6 
6 ‘ 5 " 4 ‘ ™ y 1 i 
a+l'a+2 a+3 a+4 a+5 a+6 

pie gobs 
to5t a+4'a+5 a+6 


5 

a+2 

4 2 1 
a+3 +o +o46 a+6 
alt 


+| a 
+ 


2 1 + 4 2 rt 1 )+z 3 
+ a45 +o4+6 a+5\a+5 a+6/ («#+6)? 
8 7 6 5 4 3 2 1 
— +. + + —— + —_ + —_ + — 5 +] 
+1 a+2 a+3 a+4 a+5°a+6 a+7 a+8 
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The case a = 1 is of special interest. On substitution the following numerical values are 


derived: 


8=0: a) (1)=1 8= 8: Gg (1) = 105 
g=1: a,,(1)=1 dg o(1) = 1822 
8=2: dg(1)=1 Gg, 4(1) = 7558 

(1) = 1 Gg,g(1) = 133§ 
8=3: dg ,(1) = 24 (1) = 1 

dg (1) = 1 8=9: y (1) = 537% 
8=4: &o(l)=3 Ag (1) = 396353 

@y (1) = 45 Gy 5(1) = 114355 

Q 4(1) = 1 Ay (1) = 16385 
8=5: ds ,(1) = 103 , (1) = 1 

ds (1) = 6,5; 8=10: adyyo(1) = 945 

ds, 5(1) = 1 Ayo, (1) = 167954 
8=6: dgo(1) = 15 Gyo, 4(1) = 749894 

Ge,9(1) = 245 Ayo,6(1) = 1633358 

Ag, 4(1) = 835 Ayo, 9(1) = 19355 

Mg(1) = 1 Ay9,10(1) = 1 
€=7: d,3(1) = 663 

Gy,3(1) = 45445 

My, 5(1) = 113% 

G, (1) = 1 


We conclude this section by quoting a simple numerical example which will make the use 
of the formulae (23) sufficiently clear. The second of the two formulae gives, for k = 2, 


P(4; x; B,Y) = [Og o(%) +44, o(%) (PL)? + a4 (x) (PL)*] 6(0; x; 2,7). 


To illustrate further, let £ + y <4 (sample size n < 5). Then, from (20), the last term in the 
above expansion vanishes for all permissible values of a. Hence, on using (10) and (16), 


P(4; x; B,y) = a (a) (0; a; 2, y) +44 o(x) P*L*G(0; a; £,y) 


= My (x) (0; a; 2, ¥) +44 o(a) P*{A(2 — 1) B*— 2hyBL + y(y—1)T} 
x £(0; a; B,y) 
= A, o() A(0; x; BLY) +44, 2(«) P*{A(8 — 1) A(0; a; B—- 2,7) 
— 2Byg(0; a; B—1,y—1) + y(y—1) G(0; a; 2, y—2)} 


1 
= Ay, (x) £(0; a; B,y) +420) 


x {B(B—1) d(0; a+ 2; B—2,y)— 2fyg(0; «+2; B—1,y—1) 
+y(y—1) (0; «+2; £,y—.2)} 


1 oe 1 
= 9(0; a: B+ (= a+ ) a(a +1) 
x {B(B—1) $(0; «+2; B—2,y)— 2Pyg(0; +2; B—1,y—-1) 


+y(y—1) G(0; «+2; B,y—2)}, BPry<4. 


ire 


the 
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In particular, 


$(4; 1; B, y) = $(0; 1; B,y)+44 
x H{A(P — 1) (0; 3; B—2, y) — 2B yd(0; 3; B—1,y—-1) 
+y(y—1)9(0; 3; B,y—2)}, Bry<4. 


2. THE GEOMETRICAL SIGNIFICANCE OF 4(0; «; 8,7) 
The case of # = 0, y = 0 is trivial. In this section £ + is a positive integer. We have 


$(0; a; B,y) -( z*phq’ dx 
re) l —faxt 1 i? :) B+y o |] qu, J d 
— aun i al = Gn” a) ([. Qn u) . 
= B- 
of i" -. jenn (axe +"S > "ut)] dx ‘Ta. 
Ug.4=z Upye=2 Ug, y=rJ Z=—@ 


(32) 
On applying the transformation 
¥; =Uu;—2z (¢= enh (33) 
a et, 
1 re 
d( > a; B,Y) an) ee rf cel Up. ce {> eh —~ 
xexp|—3(a+f+y) vey (2+B+y)) ~ao “TT dy, 
Pp 2(a +heyl" 
(34 


where Q is the positive-definite quadratic form 
Q(Y15 Ye» +9 pty) = (a +B+y—1)Zyj- LLY, Yjs (35) 
the accented summations indicating that 7 +7. Integrating with respect to 2’, 
4(0 B ) : [ [" r c 
»%P,Y) = wee dios 
r (a+P+y) (2a)MerFtr-D w=—02 J Ug=—O 1 ¥_,1=0 Yg+y=0 


=a) dy, dy, ... dy p,,- (36) 





<exp|—5 a+ Pry) 


The right-hand member of the equation (36) may be interpreted as a quantity proportionalf 
to the total probability-mass in one of the generalized quadrants of the multivariate normal 
distribution with dispersion matrix «+/+ times the reciprocal of the matrix of Q. The 
characteristic numbers of the matrix of Q/(a+f+y—1) are (a+f+y)/(«+f+y—1) with 
multiplicity 8+y—1 and a/(2+f+y—1) with multiplicity 1. Hence, the orthogonal 


transformation Bry 
2 oe » El V(B+Y), 


b+y 


¥y,= 2 Osi8i (s = 2,3,...,8+/Y), 


(37) 


+ The proportionality factor is (27)-#¢-)) q-4, 
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reduces Q to a sum of squares, 


Qt Yaya erh+y—0 (Tae 


at+Bry Sty} a 
Et paren) (88) 


With the further scaling transformation 


= (567 )'s (= 1,2,...,8B+y—1), 











—1 
a+p+y (39) 
Biym(or—) 
pry = Nae B+y—i) : 
+7 
QW Yar ---Ypry) = (a+ h+y—1) & EM. (40) 
The relation between the y; and the £¥ is provided by 
Pee athe es " (“tAty—"\" . 
"= Gal atp+y) y *t - Fed: (41) 
a+B+y—1\t4ty-1 a+fh+y—1\* 
v= ( S| BS dybt+ (AFAEE | hry (0 = 28,8 +7). 
Defining 
a cee are ee . eon n 
= aml | atfty ) & &t\ a ded | 
at+B+y—1\t4ty-1 a+P+y—1\3 , 
Lig = — (TE bt (FA) apa hey = 23, 
at+P+y—1\t4+7-1 a+fh+y—1\t 
Ly = (ES ttt (FI) tyler AFL B+2, B+ 





(42) 
the region of integration in the y-space, y; < 0(i = 1, 2,...,8),y,20(¢ = 2+1,2+2,...,2+Y) 
transforms into the following region R in the £*-space, demarcated by the bounding planes 
L, = 0: L,>0 (é = 1,2,...,28 +7). 

Denote the plane angle interior to R and formed by the bounding planes L; and L;, (j +1) 
by (i). 
There is no loss of generality if we set 
by p+y ry (B+y)+ (j bal 2,3,...,B+/Y). (43) 
On using (43) together with the orthogonality relationships 


B+y 
x 55, = 1, 
s=1 
B+y 
br bj = 0, 


s=1 


we find 


cos (17) 1 a+fB+y—14+7-1 1 a+fP+y-1 











(B+y)* a+PB+y Ps is (B+y)t a bs xy 








tien a + — }] &+7-1 it 
[ie+y- eer 1 1 a+f+y-1 | ieee 1 i 1 


2 
a+p+y Bty a Bey) | a+B+y 2 et 


li 
OF pay 
1 


=-Tyq G2 3,-08). 


8) 


9) 


0) 








n i 
OF pay 
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Similarly, cos (1k) = == (k = B+1,8+2,...,2 +7). 
+a 
In exactly the same way it may be shown that 


cos (tj) = — i,j together in either of the blocks (1, 2, ..., 8), 


(2+1,8+2,...,B+Y), j¥t (44) 
; 4%,j in different blocks (1, 2,..., 8), (@+1,8+2,...,8+Y), 


1 a. 
1+2a’ 
¥. l 
~ +e 


on using the orthogonality relationships 
B+y 
= bigb3,= 9 (j+#1%) 
=1l (j=6). 
The Jacobian of the transformation relating the y, to the £F is 
we (aT age (-t4t7—)". 








| ay; Yoo +++ Yp+y) 








OEE, EFs---»Eh+y)| \ athty | a 
Hence 
(0; a; B,y) 
1 a+ P+y—1\tFy a+B+y—lety b+y 
- aaa | a+B+y i. [ex »|-$ 2(a+Ph+y) ) > er] il Che AA 


Finally, apply the spherical polar transformation 








Ef = rsin x, sin x,...sinx;_, cosx; (¢ = 1,2,...,2B+y— _ (46) 
S+y = 7 SiN X, SiN xX... SIN Xp,,-1- 
Then 
by ag ut 1 a+ p+y—1\ ie rl Bis 
where dw represents an infinitesimal solid angle in a space of # + y dimensions: 
dw = sin’+7-* y, sin’+7-8 y, ... sin? yp,_3 SiN Xp+-2 4X1 Xe --- IXp4y-1- (48) 
Integrating with respect to r (0<r <0), 
nt 
2 
9(0; o; BY) = Se Ra PD = | au. (49) 


B+y 3 y 
S being the region on the unit sphere } &}* = 1 which is contained within the region R, 
1 


so that dw is simply the area of that portion of the unit sphere.t+ Denote the ratio which this 


8 1 
area bears to the area of the unit sphere by Vp »,., (cos-* ~ rex Equation (49) may then 
be rewritten in the form 


1 
P( (0; a; ; B, Y) ~ (27)Ke—D ot Vp, bry (cos = isa rs ;) ° (50) 


+ S is a (hyper)spherical simplex. For a discussion on hyperspherical simplices and their properties 
see, for example, Sommerville (1929) and Coxeter (1948). 


14-2 
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The following result consequential on (50) is noted. The moment-generating function of 
the square of the rth order statistic, counting from the lowest member, in a random sample 
of m items from a normal population is provided by 


®,,,,(9) = 6 (e9*") 
= &(z~*) (2m)? 


2 n!} 
= (27)? r ss nd (@F—1)!(n—r)! zp’-lq"—" dx 
n! 
= my Faery OO 1-285 7 Le 10 <4) 
n! - 1 
= (r—1)!(n—r)! Vii, n-a(c087 = | (1 _ 26)-* (n = * 4 eo 2. = veep 0). 


(51) 
The interesting property follows from (51) that the square of any order statistic in samples 
from a normal population is distributed as the sum of the two independent variates, one of 
which is a x? on one degree of freedom and the other a variate which has a moment-generating 
function provided by the first factor on the right-hand side of (51). In particular, 


®, (9) = (1—26)* (r= 1,2), (52) 
i.e. the even moments of either the smaller or of the greater member in a random sample of 


two items from a normal population are identical with the even moments relating to a 
sample of one item. Furthermore, 

















3 cos—} | ~ 3 
®, (8) = — — (1—20)-* = ®,,,(0), (53) 
it 
cost (— 555] 
®,,,(0) = 3\1-— (1 —26)-+, (54) 
3 cos~} (-5 5) 
®, | (9) = ———_— 1] (1—26)-4 = ®, (9) (55) 
l 
cos (— 5") 
®,,(9) = 3\1—- = (1—20)-* = ®,,,(8). (56) 


The results (52)-(56) follow from (51) on noting that 


Yo,1(9) = 4 = V1(4) (57) 
7] 

Vo, 2(4) — on - Ve, (9), (58) 
ry = 

¥, 0) = 5 (1 “| (59) 
1 (30 

%o,x(0) = 5 (= 1) =%0(6) (60) 

. 'f>"¢ 

V9) = 5 (1 -<) = Vy, 3(0). (61) 


Equation (51) is also to be regarded as valid for n = 1 provided V (9) is interpreted as 1. 


of 


le 


60) 


H. Rusen 213 


The equations (57)—(61) express the fact that the surface contents of simplices constructed 
on ‘spheres’ which are immersed in one, two and three dimensions are, respectively, a point, 
an arc of a circle and a spherical triangle, the area of the latter being provided by the 
spherical excess. For dimensionality greater than three (spherical tetrahedra, spherical 
pentahedra, etc.) the areas can no longer be expressed in terms of elementary functions 
(§3). It will appear subsequently ((65) and (66)) that the odd moments of order statistics 
in samples of 2, 3 and 4 items, drawn from a normal population, can also be expressed in 
terms of elementary trigonometric quantities. This throws some light on the fact that some 
at any rate of the first few moments in samples of size n < 4 from normal populations have 
hitherto been available (see, for example, Jones (1948)) but not those relating to samples 
of size n> 4. 

We note also that the equation (51) may be used to derive the even moments of the 
quantities, and, in particular, all the moments of the median (the odd moments of the latter 
being zero) in samples drawn from norme] populations. 

Reverting now to the question of the general moments, we substitute in (23) the expres- 
sion given for ¢(0; a; £,y) in (50), obtaining 


$(2k+1; a; B,y) 
. . L217, -1 l 
. 2 Manes, ac+1() (2ar)K2*+@) (29 + a + 1) Faery A pre Ht at2 





(k = 0,1,2,...), (62) 





p(2k; a; B,y) 
= > . LV, = Sipe, Se (& = 0, 1, 2, ...) 
= 2 Ma, (2) (2mr)2+2—-D (25 + x)Rag an” AA ty 2+a+1 an ee 
(63) 


On substituting further ~ = 1, the moments of the rth order statistic, in a sample of size n 
from a normal distribution, defined by 


Kia = (0 i) [i eerte +de (¢=1,2,...), (64) 


are provided by the following double series: 











onrilty n 
n—1 k l ; 1 
= ¥ : ek 3 : 2i+1 1 _ 
(oi) Boe”) onerern er (com-*— a5] 
(k = 0,1,2,...), (65) 
n—1\ & 1 . 1 
‘ = x J . 2i —1_ 
2klr \n n(" i) FA Ax, 9¢(1) (27)* / (2% a 1) (22)! L Ve-1.n-1( 008 al 
(k = 0,1,2,...), (66) 
having put f=r-1, y=n-r. (67) 


In (65) and in (66) it is to be remembered, as in (20), that 
LV, 21,(0)=0 (m= B+y+1,B+y+2,...). 
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The character of the double series in (65) and in (66) may be represented graphically. For 


LVa pel) = 3 (—¥ (7) B® PYM ocs-mpry-m(O) (68) 


and therefore the typical term on the right-hand side of (68) does not vanish, if, and only if, 
m—B<j<y. 

Let (f’, y’) represent the co-ordinates of a point relative to two axes in a plane. Associate 
with each such point the function Vy »,,,. Consider the non-negative integral points 
contained within and on the sides of the parallelogram with vertices (0, 0), (2, 0), (0, y), (2, y). 
Then the value of the series in (68) is obtained by summation along the non-negative in- 
tegral points on that portion of the line #’+y’ = 8+y—m which lies within and on the 
sides of the parallelogram. Correspondingly, the values of the series in (65) and in (66) are 
obtained by summation along a set of such alternately spaced lines. For the extreme 
members, y = 0, and the parallelogram shrinks into the line with end-points (0, 0) and (f, 0). 
Summation is then effected along the appropriate non-negative integral points on this 
line segment (see equations (100) and (101)). 


(2, 3 eee m=0 
ee eae ER OE eee ee eee Seneca m= 












































4,9) —- — — —m=7 
aa ‘i 
(0, 7) f Saige ‘ans: a a m=f 
¥ =~ 
No, 0-— — ——m=f+y 


Illustrating the character, and possible mode of summation, of the series in 
equations (65), (66) and (68). 


It is worth noting from (50) that 





" 1 Bly! 
"ater 8) = FE Bey’ wi 
and in particular, Vag (87) = eae (70) 


The relationship (70) is readily interpreted. For V, ,(%7) represents the relative content of 
a regular hyperspherical simplex, with primary bounding angles equal to $7, constructed 
on a sphere immersed in a space of f dimensions. Inscribe a regular generalized tetrahedron 
(with the angle between any two faces equal to cos~! 1/f) within the sphere. This generalized 
tetrahedron may be divided into £+ 1 equal simplices by a set of diametral planes which 
are inclined to each other at an angle of 37, irrespective of the dimensionality £. Hence this 
set of planes divides also the surface of the sphere into #+1 equal and regular hyper- 
spherical simplices. 

For purposes of illustration we end this section by considering the trivial yet illuminating 
case where n = 4,r = 3(f = 2, y = 1) and k = 1 in (66). This will show how the exact values 
of all the moments of order statistics in samples of size n <4, some of which have been 
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‘or } obtained by Jones (1948), may be derived immediately from the fundamental formulae 
(65) and (66) once the values of the coefficients a(1) are known. At the same time it will 
68) illustrate the use of these formulae in more difficult cases. 
On substitution in (66), 
Mt 3 1 
2 sina = 4(5) {a4 0(1) Fy s(008-*— 4) +09 9(0) 5a LM, (008-4 H) 
nts ee? *) 12!1! 1 ais ' e Ww a 
-< = a(; free +1 iz fa: 1. ,(cos~? — 4) — 2.2.1.V, ,(cos-?— 3) + 1.0.%, (cos — })]}, 
i having used equation (69). One would expect to interpret V, ,(0) as being identically zero 
vl (more generally, V, ,(9)=0, h>k). However, such an interpretetion is not strictly needed, 
"a since V, ,(cos~! — }) is preceded in the above expansion by a zero coefficient (more generally, 
0) ;,,~(9) will always be preceded in the series expansion for the moments by a zero coefficient 
hi , whenever h>k). Using the values for Vy, ,(cos~! — }) and V, ,(cos~! — 3) as given in equation 
2 (57), we obtain , PF 
; ; afta) = 1-3/7, 
in agreement with Jones’s result. 
» The terms needed in the expansion can be obtained more expeditiously from the parallelo- 
gram diagram, the coefficients being derived from the values of the a(1)’s and by the use of 
(68). The parallelogram appropriate to the case # = 2, y = 1 is depicted below: 
— SSS eee ee = m=0 
(2, 0)-—---- m= 
(1,0) — — —-—m=2 
(0,0)—- —- —--m=3 
The points along which summation is effected are (2, 1) (m = 0), and (0, 1) and (1, 0) (m = 2). 
The corresponding V’s are Vz, and VW), and V,, respectively, as in the above expansion 
for 9/43) 4. 
69) 
3. THE CONTENT, V, ,,,(9), OF A TYPE OF HYPERSPHERICAL SIMPLEX 
70) The problem of determining the area of the general hyperspherical simplex is one of great 


complexity, even when the number of dimensions is as low as 4 (Hoppe, 1882; Richmond, 
of N 1903; Coxeter, 1935), and little progress appears to have been made in this direction. 





sal However, if we confine ourselves to a singly-infinite class of hyperspherical simplices, the 

— problem becomes manageable although new types of transcendental functions are needed 

od in the solution. §§1 and 2 indicate that for the full development of the theory of order 

ch , statistics in samples from normal populations, the investigation of the properties of such 

his a class of simplices is essential. 

ee- The starting point for this investigation is Schlafli’s (1950) differential recurrence relation 
for the area S,,, of the general simplex constructed on the surface of a sphere in n’ dimensions: 

ing 1 us Fn 4 ¥ AE? 

she dS, = {S(12) d(12) + S(13)d(13)+...+S[(n’—1)n’]d[(n’—1)n’}}_ (n’ > 2), 


; n'—2 
en (71) 
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in which (ij) (j+1),. denotes the edge, of dimensionality two less than that of the original 
simplex, formed by the ith and jth flats and S(%j) its content (e.g. for n’ = 5, S(ij) is the area 
of a spherical triangle). More generally, denote the edge of the simplex, of dimensionality m 
less than that of the original simplex, formed jointly by the first, second, ..., mth flats, 
by (123... m), and a typical primary angle of this simplex by (123... mik). ‘These angles 
may be expressed in terms of the determinants of certain bordered matrices with the basic 


primary angles as elements. 
In fact (Schlafi, 1950, p. 62) 





A(; 128 ...m) = 


cos (12... mik) = ~ {A123 ... m) A(k 123... m)}#’ 








with 
| —cos(ik) -—cos(il) -—cos(i2) —cos(i3) ... —cos(im) | 
— cos (1k) 1 —cos(12) -—cos(13) ... —cos(lm) 
; _ 2 _ 21 _ =< = 
a(j.28. m) iz cos (2k) cos (21) 1 cos (23) cos (2m) (73) 
k —cos(3k) —cos(31) —cos(32) 1 ... ~— 08 (3m) 
| —cos(mk) —cos(ml) —cos(m2) —cos(m8) ... 1 
| 1 —cos(il) —cos(i2) —cos(i3) ... —cos(im) 
| — cos (12) 1 —cos(12) -—cos(13) ... —cos(lm) 
A(i123...m) = | —cos(2i) —cos(21) 1 —cos(23) ... —cos(2m) (74) 
| —cos(3i) —cos(31) —cos(32) 1 ... —008 (3m) 
| | 
—cos(mi) —cos(ml) —cos(m2) —cos(m3) ... 1 | 
In terms of relative areas (71) may be written 
1 Te eres , ’ 
j>i 


with an obvious slight change in notation. In this standardized form the relationship holds 
also for n’ = 2 provided V,(@) is as above interpreted as being 1. 

The simplex under investigation is one constructed on a sphere immersed in a space of 
£+v~-dimensionality, and one demarcated by £+y 2+~y+1-flats which may be divided 
into two sets (1, 2,3,...,8) and (#+1,#+2,...,8+/) in such a way that 


(ij) = 0, i,j in same set 


=7—6, i,jinopposite sets (47<0<7). (75) 


“ 


From (72), (73) and (74), 
-a(? i) 
(76) 


cos (ij oT) = A oa Ary} 





3) 


of 
od 
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with —cos(oT) —cos(ci) _—cos(aj) 
a(* ii) =| —cos(ir) 1 —cos (ij) |, (77) 
—cos(jT) —cos(ji) 1 
1 —cos(ci) —cos(aj) 
A(oij) =| —cos(ic) 1 —cos (ij) |, (78) 
| —cos(jo) —cos(jt) 1 
| 1 —cos(Ti) —cos(Tj) 
A(T ij) =| —cos(i7) 1 —cos (ij) |. (78’) 
—cos(jT) —cos(jt) 1 





There are sixteen possibilities to consider, for 7, j7, ¢ and 7 may each be in either of the two 
sets. A typical example, appropriate to the case wherein i, 7, c and 7 are all in the first: (or 
all in the second) set, gives 





—-A -A -A 
a(* i) | A ate Se 
| -A -A 1 
= —A(1+A)’, 
| 1 -A —-A 
A(oij)=| -A 1 —A pers 
|—-A —A 2B | 
= (1+A)?(1—2A), 
where A=cos6. (79) 
Hence, in this case, cos (ij oT) = Too’ 


Similar examination of the third-order determinants involved reveals that 


cos (ij or) = — cos (¢(4)); o, 7 in same set, all 2, 7, 7+1, 7T+0 


= at =cos (7 — (8)); o, 7 in opposite sets, all i, 7, j +4. (80) 
Consider (ij), where i and j belong to the first set (1, 2, ..., 8) with j +7. This is a skew simplex 
of the type already discussed with dimensionality reduced by two. It may be regarded as 
demarcated by #+y-—2 £+y-—3-flats which divide into two groups (1, 2,...,.-—1,++14.,..., 
j-1,j+1,...,8), (@+1,2+2,...,2+~y) such that the angle between any two distinct flats 
in the same group is $(@) and that between any two flats in distinct groups is 7— ¢(). 
Here also d(ij) = d@. Consider now (ij), where i and j belong to the second set (+1, #+2, 
...,8+Y) with j +i. This is also a skew simplex of the two already discussed with dimen- 
sionality reduced by two. It may be regarded as demarcated by #+y—2 #+y—3-flats 
which divide into two groups (1, 2,...,4), (@+1,....t-Li+l,....j-lLj+l,...,B+y), such 
that the angle between any two distinct flats in the same group is ¢() and that between 
any two flats in different groups is 7— ¢(0). Here, likewise, d(ij) = d@. Consider, finally, (ij), 
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where i and j are in opposite sets. This is once again a skew simplex of the type already 
discussed with dimensionality reduced by two. It may be regarded as demarcated by 
B+y—2 #+y-—3-flats which divide into two groups (1,2,...,i—1,i+1,...,8), (@+1, 
£+2,...,j-1,j+1,...,2+/7) such that the angle between any two distinct flats in the 
same group is ¢(@) and that between any two flats in different groups is 7—¢(6). Here, 
d(ij) = —dé. 

Combining these results and applying them in (71’), 


pp) = | (3) Vo-nps4-a0(O))— BVs-a,pry-a(8(0)) + (2) Vypsy-s(B(6)} 20. (81) 


Hence 


1 Lee 7 
Vesarv() = spartan |, { (9) Veneer) —BYVp-1,p19-s0B00") + (2) Vpey-a(0(0")} a8” 


(2,y = 0,1,2,...). (82) 
Symbolically, 
] Te ig - , P 
Vp,p+(9) = ofty * Gn [Un ))) d0 
: L? ‘ V, 0’)) da’ 2 
= at ag ht] Yan(GO0)d9" (f,y = 0,1,2,...) (83) 
Equation (83) may be used to express V, ,, (9) in terms of a set of auxiliary functions defined 


by (0) = 
m0 Wry, (6(0'))d0’ (Ar<O<m) (k=1,2,...). (84) 


In fact, by successive application of (83), 


™ l l ¢ 6 l l 3 $(4;) ™ 


l dione, @ 1 £9 fe 
= ort op Le (9) + (477)? ) an J an LV), +(P(92)) d0,d9,. 
More generally, 


Ve, p+y(9) 


i l 8 dO) (dO) HO) | 
= aB+y = El) v2 \s41 f | : | . eee [ L**+2V) 2 4(P(9541)) d6,,,40, eee d0,d0,. 
(85) 
There are two possibilities 
(i) B+y even. 
Setting s = 4(f+y)— 


Ve, p+y(9) 


1 #4+y)-1 aise $(4;) P(F4(8 + y)- "Ve 
™~ OF FY 2, . ¥x(9 + (Gn re| | a> LAV 5, 3 P(A¢p+y))) Oy p49) --- 1, 


1 #4+y)-1 
= oy Ph L* (9) + 55 (— Ptr (B+)! Vupsy(9), (86) 


since L4+(V,,p,,(8)) =(—)? (B+)! (87) 
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Again, [ft .1 = ‘S (=) (* rw yD pry) 
‘ j=0 ae, 
—(~)r (FY) ng 
(~ P27) yoga 
—)(B+y)!. (88) 
E 1 #+y) 
Hence Va, p+y(@) = ory 2, L*y,(0), B+y even. (89) 


(ii) B+y odd. 
Setting s = $(8+y—3) 


1] #@+y-3) 
Vac) = say Lal) 


1 ro $1) $(948+y—-3)) : 
+ Gnyier— =| pte =f lt Vp, pry(P(xp+y-»)) dO p45» ... dO, 
1 *A+y—3) wot 
ney ee seaea HY -AV(B+Y— 1)! Vapeyv0) (90) 


since 


Bt+y-1 ; — 1) ‘ ; 
LA+7-*\(Vp, +-4(9)) - = ‘“? ‘eye ‘) YP Perr t7-V, _41,1() 
j= 


= (= PSY) pemyir—a, 0) +(— (P*7 1) e-orpor, 6) 


Me oy 
= -—4#(2—-y)(2+y—1)!. (91) 
Again, Tftrv-1.1 = «Sy (-y ae ‘af ‘) i pery-i- 
j=0 J 
—(f—y)(F+7—-1)!. (92) 
1 #Aty-D 
Hence, Va, p+y\9) = sm 2, Ly,(0), B+y odd. (93) 
Combining (89) and (93), 
1 tn) 
Veer) = spy Lal), (94) 


where [m] denotes, as usual, the integral part of m. 

The character of the double series in the right-hand member of (94) is similar to that of the 
series in (65) and (68), and the parallelogram diagram may be used to depict this character, 
as well as to effect summation. The values of V, ,,,(@), and in particular the values of 


Vp, psy (cos- -<) (m = 2,3,4,...) needed for the specification of the moments in (65) and 
in (66), may be studied from the point of view of the auxiliary functions y,(@) and the special 


1 : . 
values y;, (cos - =) (m = 3,4, 5,...), or, alternatively, from the point of view of the chain 


t It is known (Schlafli, 1950, pp. 240-3) that the content of a simplex on the surface of a sphere 
in 2m+1 dimensions is a linear function of the contents of the simplices on the surfaces of spheres 
in dimensions, 2, 4, 6, ..., 2m. Equation (94) illustrates this property in a particular instance. 
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relationship (83). In either case, in view of the special nature of the arguments of the 
y,’s and of the V, ,,,,, the following transformation is useful: 


6’ = cos-} (-5) (a’ > 1). (95) 
Defining V(x) = (9), (96) 
Vp, pry(2) = Vp, p+y(9), (97) 


the following differential-difference equations for the y’s, and the Vp, p+y are derived from 
(84) and (83): 








+ “._. dx’ 
= (+H, the dS 
x) [ Yu_1(x * 2) 7 V(x”? - 1) (k i, ? ), (98) 
7 1 1 ° ‘ dz’ 
V5, p+y(*) = metal. Vo pryl@ 7 72 =) (f,y = 0, 1, 2, ...). (99) 


The case of the extreme members of the sample (y = 0, r = 7) is of special interest. Equa- 
tions (65) and (66) reduce to 


p k n—1 1 1 
2k+1h nin = MX Aax+1,2¢+1(1) tae (2) (2 + 21) 2i) Vi-2-2%, n—2— n(208- -1_ = =i 


(k = 0,1,2,...), (100) 





. n—1 1 ( .. 
nin = oat NV i = ) wee 
(101) 
The simplices involved in (100) and in (101) are regular with primary bounding angles 
cos~! (— 4), cos~1( —}), ..., and cos~! (— }), cos“! (—}), ..., respectively. In terms of the V’s, 


k - 
x+ilnin = nd ox +1, 21+1(1) (>. id sae RT Vn—2-2i,n-2-2(3 +24) (k = 0,1,2,...), 
(100’) 
: & n—1 1 - , 
kl nin = nd Ax, oi(1) ( 9 (Qn) (+1) V-1-04,0-1-0(2 +28) (k= 0,1,2, ...). 
(101’) 
In the last four equations, the combinational coefficient (‘) is to be interpreted as zero if s > r. 


Equation (83) reduces to 


—1 
Vp, a9) = ie = ofr Vp 2, f— o( (9(8’)) do’ (8 _ 0, 1,2, ooo) (102) 
or, in terms of the V’s, 
1 dx’ ’ 
Vo, (a) = a |" Vp; p- o(% '+2)5 z Je ®—) (2 = 0,1, 2,...). (102’) 


Equation (94) reduces to 
Vy pA) 5Y,) 5 2 PO 6) (8 = 0,1, 2,...). (103) 


he 
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It is easy to verify from (70) and (103) that 


2 pore 
vn(=) = Ger pi 7 Fe) (= 0,1,2,...) (104) 


Also, from (103) (together with the fact that V, (7) = 4), 





Wel™) = apy = Vell) (k = 0,1,2,...). (105) 
Finally, equation (51) reduces to 


© 


: 1 
®,, I n(9) mp nV,,_-1, n—-1 (cos + 2—20 
my nV,,-1 n-1(2— 28) (1 — 26)-+. (106) 


4, COMPUTATION OF THE CONTENTS OF REGULAR HYPERSPHERICAL SIMPLICES AND THE 
ASSOCIATED MOMENTS OF THE EXTREME ORDER STATISTICS IN SAMPLES FROM NORMAL 
POPULATIONS 


A knowledge of the contents of regular hyperspherical simplices is likely to be useful else- 
where in statistical applications and is not without theoretical interest. Table 1 provides 
the values of V, g(x) for x = 2(1)12 and # = 1(1)49. These are the values which arise in 
(100’) and in (101’) in the specification of the first ten moments of the extreme members in 
a sample containing up to fifty items drawn from a normal population. The values of the 
Ve. (2) have been computed from the recurrence formula (102’). The values of the moments 
of the extreme members, given to a greater degree of accuracy than has hitherto been found 
computationally convenient, are provided in Table 2. These have been derived from equa- 
tions (100’) and (101’), using the values of the a(1)’s given at the end of § 1 and the values of 
the Ve. p(x) given in table 1. Table 3 gives the second, third and fourth moments about the 
mean of the extreme members in samples of all sizes up to and including n = 50, as well as 
the standard deviations, together with /, and £,—3. 

It is hoped subsequently to compute by electronic means the values of the Ve, pry(®) 
(relative contents of skew hyperspherical simplices with primary bounding angles 
cos~!(— 1/2) and m—cos~'(—1/2)) from the generalized recurrence formula (99), and to 
apply these values in the general formulae (65) and (66) for the derivation of at least the 
first four moments of order statistics which are not extreme. 


I am greatly indebted to Prof. Sir Ronald Fisher, F.R.S., for his advice and encourage- 
ment, 
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Table 1. Relative surface contents of regular hyperspherical simplices %,,(x) with 
dimensionality n and primary bounding angles cos (— 1/2x)+ 





3 


POD | 


CMatan 


— 
> 


12 





Un—y(2) 


























Un—s(4) Un—s(6) Un—7(8) Un—g( 10) Un—11( 12) 
1-00000 00000 — — —_ —_— — 
0-50000 00000 _- — —_ — _ 
0-33333 33333 | 1-00000 0000 — — — — 
0-25000 00000 | 0-50000 0000 — — —_— — 
0-20000 00000 | 0-29021 5312 1-00000 0000 — — — 
0-16666 66667 | 0-18532 2967 0-50000 0000 — | — — 
0-14285 71429 | 0-126479249 | 0-276650190 1-00000000 | _ — 
0-12500 00000 | 0-09065 9844 | 0-16497 5284 0-50000000 | — — 
O-11111 11111 | 0-067482726 | 0-10422 4047 0-26994 654 1-00000 00 --- 
0-10000 00000 | 0-05175 6879 0-06893 46456 | 0-15491 982 0-50000 00 — 
0-09090 90909 | 0-04067 3686 0-04732 96970 | 0-09344 505 0-26594 21 1-00000 0 
0-08333 33333 | 0-032615725 | 0-0335205107 | 0-05874 626 0-14891 32 0-50000 0 
0-07692 30769 | 0-026603196 | 0-0243708312 | 0-03824 650 0-08708 71 | 0-26327 8 
0-07142 85714 | 0-022017121 | 0-0181206201 | 0-02565 733 0-05286 14 ; 0-144917 
0-06666 66667 | 0-018452415 | 0-0137376418 | 0-01766 4546 0-03313989 | 0-082894 
0-06250 00000 | 0-015635828 | 0-0105932558 | 0-012440956 0-02137108 | 0-049040 
0-05882 35294 | 0-01337 8162 0-00829 19269 | 0-00893 9452 0-01412 973 0-02988 9 
0-05555 55556 | 0-011545394 | 0-00657 75941 | 0-00653 8976 0-00955 140 0-018701 
0-05263 15789 | 0-01004 0653 0-00528 02708 | 0-00486 0042 0-00658 678 0-01198 4 
0-05000 00000 | 0-00879 26628 | 0-00428 45634 | 0-00366 44887 | 0-00462 3852 0-00783 97 
0-04761 90476 | 0-0077481250 | 0-00351 05571 | 0-00279 92407 | 0-003299116 0-00523 58 
0-04545 45455 | 0-00686 66186 | 0-00290 19034 | 0-0021637714 | 0-00238 8923 0-00355 63 
0-04347 82608 | 0-00611 70792 | 0-00241 82187 | 0-0016907679 | 0-00175 3346 0-00245 48 
0-04166 66667 | 0-00547 53518 | 0-00203 01296 | 0-00133 43527 | 0-00130 2901 0-00171 96 
0-04000 00000 | 0-00492 24500 | 0-00171 59797 | 0-00106 27478 | 0-00097 9287 0-00122 121 
0-03846 15384 | 0-00444 33011 | 0-0014596035 | 0-00085 36129 | 0-00074 3854 0-00087 824 
0-03703 70370 | 0-00402 58300 | 0-00124 87951 | 0-00069 10262 | 0-00057 0572 0-00063 900 
0-03571 42857 | 0-00366 02813 | 000107 42412 | 0-00056 34963 | 0-00044 1654 0-00047 000 
0-03448 27586 | 0-00333 87120 | 0-0009287651 | 0-00046 26319 | 0-000344771 0-00034 921 


0-03333 33333 
0-03225 80645 
0-03125 00000 
0-03030 30303 
0-02941 17647 


0-02857 14285 
0-02777 77778 
0-02702 70270 
0-02631 57894 


0-02564 10256 | 


0-02500 00000 
0-02439 02439 
0-02380 95238 
0-02325 58139 


0-02272 72727 | 


0-02222 22222 | 


0-02173 91304 
0-02127 65957 
0-02083 33333 
0-02040 81632 


0-02000 00000 


0-00305 46077 
0-00280 25904 
0-00257 81889 
0-00237 76700 
0-00219 78937 


0-60203 62079 
0-00189 03610 
0-00175 84330 
0-00163 87790 
0-00152 99842 


0-00143 08270 
0-00134 02485 
0-00125 73278 
0-00118 12613 
0-00111 13455 


0-00104 69623 
0-00098 75678 





0-00093 26812 | 


0-00088 18772 


0-00080 67867 
0-00070 39300 
0-00061 67367 
0-00054 24524 
0-00047 88690 


0-00042 42044 
0-00037 70113 
0-00033 61078 
0-00030 05231 
0-00026 94559 


0-00024 22414 | 


0-00021 83255 
0-00019 72443 
0-00017 86078 
0-00016 20870 


0-00014 74026 
0-00013 43177 


0-00012 26296 | 


0-00011 21650 


0-00083 47778 | 0-00010 27750 





0-00038 22384 
0-00031 76966 
0-00026 52568 
0-00022 30929 
0-00018 83671 


0-00015 97894 
0-00013 61459 
0-00011 64857 
0-00010 00595 
0-00008 62731 





0-00007 46525 | 


0-00006 48171 
0-00005 64603 
0-00004 93332 
0-00004 32335 


0-00003 79951 
0-00003 34819 
0-00002 95815 
0-00002 62005 


| 0-00002 32614 


0-00079 10470 | 0-00009 43311 | 0-00002 06994 





0-00027 12811 
0-00021 50445 
0-00017 16569 
0-00013 79231 
0-00011 15029 


0-00009 06725 
0-00007 41397 
0-00006 09405 
0-00005 03360 
0-00004 17740 





0-00003 48236 | 


0-00002 91534 
0-00002 45055 
0-00002 06786 
0-00001 75139 


0-00001 48862 
0-00001 26956 


0-00001 08626 | 


0-00000 93233 
0-00000 80260 


0-00000 69292 





0-00026 1932 
0-00019 8216 
0-00015 1249 
0-00011 6318 
0-00009 0117 


0-00007 03046 
0-00005 51924 
0-00004 36264 
0-00003 46801 
0-00002 77157 


0-00002 22795 
0-00001 79963 
0-00001 46075 
0-00001 19117 
0-00000 97565 


0-00000 80238 
0-00000 66278 
0-00000 54951 
0-00000 45730 
0-00000 38192 


0-00000 32007 





+ For convenience, we have written V5, p(@) = (2). 
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} Table 1 (cont.) 
n Un—2(3) Un—4(5) Uy—(7) Un—s(9) Un—19(11) 
l = =<. pane va a. 
2 1-00000 00000 — = -- - = 
3 0-50000 00000 - o-- — — 
4 0-30408 67240 1-00000 00000 _ — — 
5 0-20613 00860 0-50000 00000 —_— — —_— 
6 0-14973 76529 0-28204 71084 1-00600 000 — — 
7 0-11412 73223 0-17307 06626 0-50000 000 —_ —_ 
8 0-09011 66373 0-11301 25446 0-27281 447 1-00000 00 — 
9 0-07311 43694 0-07741 35906 0-15922 171 0-50000 00 — 
10 0-06060 81823 0-05508 32201 0-09803 672 0-26772 05 1-00000 0 
ll 0-05112 51858 0-04042 61431 0-06305 560 0-15158 07 0-50000 0 
12 0-04375 34970 0-03044 14257 0-04205 400 0-08990 27 0-26448 9 
13 0-03790 30207 0-02342 53126 0-02891 971 0-05545 54 0-146733 
14 0-03317 76395 0-01836 38033 0-02041 603 0-03537 66 008479 5 
15 0-02930 31093 0-01462 89404 0-01474 400 0-02323 49 0-05076 6 
° 16 0-02608 44854 0-01181 83990 0-01086 154 0-01565 44 0-031350 
17 0-02337 99954 0-00966 67260 0-00814 300 0-01078 73 0-01989 5 
18 0-02108 44609 0-00799 42195 0-00620 083 0-00758 37 0-012938 
19 0-01911 84699 0-00667 64417 6-00478 826 0-00542 79 0-00859 9 
20 0-01742 11230 0-00562 54722 0-00374 4257 0-00394 826 0:00582 81 
21 0-01594 50617 0-00477 80637 0-00296 1394 0-00291 427 0-00402 11 
22 0-01465 29936 0-00408 79746 0-00236 6593 0-00217 989 0-00281 97 
23 0-01351 52178 0-00352 08978 0-00190 9238 0-00165 053 0-00200 67 
24 0-01250 78348 0-00305 10386 0-00155 3697 0-00126 376 0-00144 76 
25 0-01161 14315 | 0-00265 87653 0-00127 4511 0-00097 762 0-00i05 75 
26 0-0108101025 | 0-00232 89690 0-00105 3421 0-00076 350 0-00078 15 
27 0-01009 07126 0-00204 99013 0-00087 6364 0-00060 155 0-00058 37 
28 0-00944 23349 | 0-00181 23400 0-00073 3843 0-00047 785 0-00044 04 
29 0-00885 58175 0-00160 89817 0-00061 8152 0-00038 249 0-00033 54 
t } 
30 0-00832 34467 0-00143 39955 0-00052 35919 0-00030 8361 0-00025 762 
; 31 0-00783 86833 | 0-00128 26902 0-00044 58018 0-00025 0263 0-00019 951 
32 0-00739 59526 0-00115 12650 0-00038 14199 0-00020 4389 0-00015 569 
j 33 0-00699 04786 0-:00103 66196 0-00032 78320 0-00016 7911 0-00012 237 
34 0-00661 81492 | 0-00093 62093 0-00028 29897 0-00013 8712 0-00009 684 
35 0-00627 54084 | 0-00084 79336 0-00024 52764 0-00011 5194 0-00007 712 
, 36 0-00595 91679 0-00077 00483 0-00021 34076 0-00009 6139 0-00006 180 
37 0-00566 67346 | 0-00070 10983 0-00018 63565 0-00008 0613 0-00004 980 
; 38 0-00539 57516 0-00063 98631 0-00016 32967 0-00006 7896 0-00004 035 
39 0-00514 41482 0-00058 53144 0-00014 35599 0-00005 7428 0-00003 286 
40 0-00491 00998 | 0-00053 65820 0-00012 66023 0-00004 8770 0-00002 689 
41 0-00469 19927 0-00049 29266 (@-00011 79793 0-00004 1576 0-00002 211 
42 0-00448 83958 0-00045 37172 0-00009 93256 0-00003 5572 0-00001 827 
; 43 0-00429 80362 | 0-00041 84134 0-00008 83397 0-00003 0542 0-00001 515 
44 0-00411 97785 0-00038 65508 0-00007 87716 0-00002 6311 0-00001 262 
45 0-00395 26071 0-00035 77288 0-00007 04131 0-00002 2738 0-00001 055 
46 0-00379 56117 0-00633 16006 0-00006 30901 0-00001 9710 0-00000 885 
47 0-00364 79741 | 0-0003078655 0-00005 66566 0-00001 7136 0-00000 746 
48 0-00350 89572 0-00028 62613 0-00005.09895 0-00001 4939 0-00000 630 
49 0-00337 78955 0-00026 65593 0-00004 59848 0-00001 3059 0-00000 534 
) 
50 0-00325 41871 0-00024 85591 0-00004 15543 0-00001 1446 0-00000 454 
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Table 2. Moments of extreme members, ,j1;,;,,, in samples of size n 
drawn from normal populations 














0 
0-56418 9584 
0-84628 4377 
1-02937 5374 


1-16296 4474 
1-26720 6361 
1-35217 8376 
1-42360 0306 
1-48501 3162 


1-53875 2731 
1-58643 6352 
1-62922 7640 
1-66799 0177 
1-70338 1554 


1-73591 3445 
1-76599 1393 
1-79394 1981 
1-82003 1879 
1-84448 1512 


1-86747 5060 
1-88916 7917 
1-90969 2325 
1-92916 1713 
1-94767 4075 


1-96531 4610 
1-98215 7840 
1-99826 9303 
2-01370 6925 
2-02852 2147 


2-04276 0845 
2-05646 4099 
2-06966 8829 
2-08240 8337 
2-09471 2757 


2-10660 9441 
2-11812 3288 
2-12927 7027 
2-14009 1456 
2-15058 5659 


2-16077 7179 
2-17068 2186 
218031 5609 
2-18969 1262 
2-19882 1950 


2-20771 9565 
2-21639 5169 
2-22485 9069 
2-23312 0882 
2-24118 9599 


2-24907 3631 


| 








2 


1-00000 0000 
1-00000 0000 
1-27566 4448 
1-55132 8896 


1-80002 0436 
2-02173 9069 
2-22030 4136 
2-°39953 4975 
2-56261 7418 


2-71210 3790 
2-85002 7742 
2-97801 9090 
3-09739 6615 
3-20923 8822 


3°31443 7059 
3-41373 5410 
3-50776 0835 
3-59704 6171 
3-68204 7852 


3-76315 9715 
3°84072 3854 
3-91503 9251 
3-98636 8684 
4-05494 4325 


4-12097 2294 
4-18463 6408 
4-24610 1279 
4-30551 4889 
4-36301 0759 


4-41870 9768 
4-47272 1701 
4-52514 6562 
4-57607 5706 
4-62559 2804 


4-67377 4683 
4-72069 2054 
4:76641 0142 
481098 9241 
4-85448 5197 


4-89694 9839 
4-93843 1351 
4-97897 4611 
5-01862 1481 
5-05741 1077 


509538 0002 
5-13256 2557 
5-16899 0930 
5-20469 5366 
5-23970 4322 


5-27404 4605 


| 


3 


UBER RE. 


0 
1-41047 3959 
2-11571 0938 
2-70042 5704 


3-22487 9364 
3-70526 1794 
4-14966 7934 
4-56358 1597 
4-95118 1032 


5-31580 4079 
v'66018 0737 
5-98657 7532 
6-29689 7766 
6-59275 5132 


6-87552 9415 
7-14640 9450 
7-40642 6717 
7-65648 1911 
7:89736 6176 


8-12977 8192 
8-35433 8148 
8-57159 9074 
8-78205 6290 
8-98615 5205 


9-18429 7863 
9-37684 8436 
9-56413 7882 
9-74646 7901 
9-92411 4318 


10-09732 9992 
10-26634 7323 
10-43138 0426 
10-59262 7020 
10-75027 0082 


10-90447 9295 
11-05541 2315 
11-20321 5903 
11-34802 6914 
11-48997 3178 


11-62917 4294 
11-76574 2323 
11-89978 2420 
12-03139 3399 
12-16066 8233 


12-28769 4522 
12-41255 4895 
12-53532 7387 
12-65608 5780 
12-77489 9908 


12-89183 5941 











4 5 
3-00000 0000 0 
3-000000000 | 6-06503 8023 


4-19454 5940 
5-38909 1881 


6-52339 5486 
7-59745 6758 
8-61704 4920 
9-58792 9198 
10-51515 5116 


11-40304 4507 
12-25530 1015 
13-07511 5628 
13-86525 5250 
14-62813 4342 


15-36587 2528 
16-08034 1083 
16-77320 0671 
17-44593 2213 
18-09986 2296 


18-73618 4245 
1935597 5699 
19-96021 3381 
20-54978 5557 
21-12550 263 


21-68810 616 
22-23827 665 
22°77664 017 
23-30377 420 
23-82021 261 


24-32645 008 
24-82294 588 
2531012 735 
25-78839 276 
26-25811 404 


26-71963 905 
27-17329 371 
27-61938 383 
28-05819 679 
28-49000 300 


28-91505 728 
29-33360 007 
29-74585 847 
30-15204 729 
30-55236 992 


30-94701 916 
31-33617 792 
31-72001 998 
32-09871 054 
32-47240 679 


32-84125 849 





9-09755 7036 
11-88062 0268 


14-53895 5578 
17-09454 9080 
19-55839 3798 
21-93884 615 
24-24294 426 


26-47678 168 
28-64569 426 
30-75438 816 
32-80703 849 
34-80736 837 


36-75871 365 
38-66407 622 
40-52616 843 
42-34745 031 
44-13016 084 


45-87634 459 
47-58787 434 
49-26647 044 
50-91371 752 
52-53107 895 


54-11990 930 
55-68146 535 
57-21691 556 
58-72734 852 
60-21378 036 


61-67716 127 
63-11838 132 
64-53827 561 
65-93762 894 
67-31717 988 


68-67762 449 
70-01961 969 
71-34378 620 
72-65071 130 
73-94095 130 


75-21503 372 
76-47345 938 
77-71670 421 
78-94522 094 
80-15944 068 


81-35977 429 
82-54661 372 
83-72033 315 
84-88129 016 
86-02982 667 


87-16626 994 
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Table 2 (cont.) 
6 7 8 9 10 
n 

1 15-00600 000 0 105-00000 0 0 945-00000 

2 15-00000 000 37-44808 36 105-00000 0 303-287163 945-00000 

3 21-79972 305 56-17212 54 155-232188 454-93074 5 1407-98355 

4 28-59944 610 74-16003 19 205-46437 7 603-735139 1870-96710 

5 35-22660 772 91-77987 07 255-06652 6 751-119937 2331-11618 

6 41-68120 792 109-07401 19 304:03863 8 897-19147 1 2788-43082 

7 47-97432 643 126-06364 04 352-40762 6 1042-00290 7 3242-99105 

8 54-11704 295 142-76617 39 400-20040 7 1185-60041 6 3694-87692 

9 60-11939 069 159-1971468 447-441988 1328-02667 6 4144-16423 
10 65-99035 636 175-37063 38 494-15547 2 1469-32149 3 4590-92456 
ll 71-73800 254 191-29946 19 540-36223 3 1609-522113 5035-22556 
12 77-36959 015 206-99536 78 586-08209 7 1748-66346 5 5477-13127 
13 82-89168 443 222-46912 62 631-33349 6 1886-77837 6 5916-70244 
14 88-31024 450 237-73065 92 676-13361 2 2023-89775 2 6353-99673 
15 93-63069 901 252-78913 03 720-49849 5 2160-05074 7 6789-06900 
16 98-85801 015 267-65302 48 764-443169 2295-26490 1 7221-97147 
17 103-99672 809 282-33021 97 807-981729 2429-56628 0 7652-75394 
18 109-05103 766 296-82804 46 851-127418 2562-97958 8 8081-46396 
19 114-02479 836 311-15333 50 893-89270 6 2695-52826 2 8508-14696 
20 118-92157 897 325-31247 89 936-28934 9 2827-23458 7 8932-84640 
21 123-74468 755 339-31145 76 978-32845 4 2958-11976 4 9355-60392 
22 128-49719 757 353-15588 21 1020-02052 5 3088-20399 1 9776-45944 
23 133-18197 078 366-85102 56 1061-37551 5 3217-50653 5 10195-45124 
24 137-80167 727 380-40185 19 1102-40286 1 3346-04579 3 10612-61612 
25 142-35881 312 393-81304 07 1143-111529 3473-83935 1 11027-98943 
26 146-85571 609 407-08901 08 1183-51004 4 3600-90403 7 11441-60519 
27 151-29457 942 420-23394 06 1223-60652 3 3727-25596 8 11853-49616 
28 155-67746 420 433-25178 61 1263-40870 2 3852-91059 3 12263-69389 
29 160-00631 037 446-14629 83 1302-92396 2 3977-88273 7 12672-22882 
30 164-28294 657 458-92103 76 1342-15935 3 4102-18663 7 13079-13032 
31 168-50909 90 471-57938 78 1381-121615 4225-83597 5 13484-42672 

2 172-68639 92 484-12456 81 1419-817197 4348-84391 1 13888-14543 
33 176-81639 16 496-55964 50 1458-25227 8 4471-223110 14290-31294 
34 180-90053 95 508-88754 19 1496-43277 8 4592-98577 6 14690-95486 
35 184-94023 16 521-1110491 1534-36438 0 4714-14366 8 15090-09599 
36 188-93678 64 533-23283 20 1572-05253 8 4834-70813 0 15487-76036 
37 192-89145 81 545-25543 90 1609-50249 4 4954-69011 0 15883-97122 
38 196-80544 02 557-1813091 1646 71928 9 5074-10018 2 16278-75114 
39 200-67987 00 569-01277 81 1683-70777 2 5192-94856 3 16672-12201 
40 204-51583 21 580-75208 53 1720-47261 3 5311-24513 1 17064-10506 
41 208-31436 17 592-40137 87 1757-01831 5 5428-99944 4 17454-72091 
42 212-07644 81 603-96272 02 1793-34921 5 5546-22075 1 17843-98959 
43 215-80303 72 61543809 10 1829-46950 0 5662-91801 2 18231-93056 
44 219-49503 42 626-82939 59 1865-38321 0 5779-09990 6 18618-56273 
45 223-15330 61 638-13846 69 1901-09424 9 5894-774748 19003-90451 
46 226-77868 38 649-36706 80 1936-60639 0 6009-95099 8 19387-97380 
47 230-37196 46 660-51689 81 1971-92327 9 6124-63627 3 19770-78803 
48 233-93391 36 671-58959 50 2007-04844 5 6238-83836 1 20152-36417 
49 237-46526 56 682-58673 78 2041-98530 5 6352-56472 5 20532-71874 
50 240-96672 72 693-50985 06 2076-73716 4 6465-82261 9 20911-86784 
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Table 3. The moments about the mean, the measures of skewness and of kurtosis and the standard 
deviation of extreme members in sampies of size n drawn from normal populations 


Moments of order statistics in samples from normal populations 



































n He Hs M!s A, $,-3 VP2 
1 1-00000 000 0 3-00000 00 0 0 1-00000 00 
2 0-68169 011 0-07707 945 1-42279 69 0-01875 50 0-06174 43 0°82564 53 
3 0-55946 720 0-08919 934 0-97955 22 0-04543 59 0-12952 43 0°74797 54 
4 0-49171 524 0-:09120 683 0-76459 74 0-06997 03 0-16231 77 0-70122 41 
5 0-44753 407 0-09058 710 0-64107 54 | 0-09154 92 0-20078 82 0-66897 99 
6 0-41592 711 0-08917 023 0-55943 20 0-11050 66 0-23379 78 0-64492 41 
7 0-39191 778 0-08753 521 0-50113 26 | 0-12728 60 0-26259 20 0-62603 34 
8 0-37289 714 0-08588 990 0-4572120 | 0-1422712 0-28805 84 0-61065 30 
9 0-35735 333 0-08431 165 0-42279 90 | 0-15576 92 0-31084 08 0-°59779 04 
10 0-34434 382 0-08282 697 0-39501 52 0-16802 22 0-33141 89 0-58680 82 
11 0-33324 744 0-08144 141 0-37204 79 0-17922 15 0°35015 75 0-57727 59 
12 032363 639 0-08015 182 0-35269 69 0-18951 99 0-36733 95 0-56889 05 
13 0-31520 538 0-07895 167 0-33613 48 0-19904 06 0-38318 91 0-56143 15 
14 0-30773 010 0-07783 336 0-32177 24 0-20788 42 0-39788 63 0°55473 43 
15 030104 157 0-07678 934 0-30917 77 0-21613 37 0-41157 72 0-54867 26 
16 0-29500 981 0-07581 253 0-29802 68 0-22385 80 0-42438 30 0-5431481 
17 0-28953 300 0-07489 647 0-2880716 | 0-2311152 0-43640 41 0-53808 27 
18 0:28453 013 0-07403 540 0-2791189 | 0-2379545 0-44772 53 0-53341 37 
19 0:27993 580 0:07322 415 0-27101 57 | 0-24441 81 0-45841 87 0-52908 96 
20 0-27569 662 0-07245 817 0-26363 94 0-25054 19 0-46854 62 0-52506 82 
21 0-27176 844 0-07173 345 0-25689 02 0-25635 77 0-4781600 0-52131 41 
22 0-26811 447 0-07104 635 0-25068 64 0-26189 23 0-48730 80 0-51779 77 
23 0-26470 377 0-07039 370 0-2449601 | 0-2671697 0-49603 01 0°51449 37 
24 0-26151 002 0-06977 268 0-23965 45 0-:2722111 050436 15 0-51138 05 
25 0-25851078 | 0-06918079 0-2347216 | 02770353 0-51233 28 0-50843 96 
26 0-25568 671 0-06861 576 0-23012 06 | 0-28165 88 0-51997 21 0-50565 47 
27 0-25302107 | 0-06807 556 0-22581 68 0-28609 63 0-52730 46 0-50301 20 
28 0-25049 931 0-06755 837 0-22178 02 0-29036 13 0-53435 19 0-50049 91 
29 0-24810866 | 0-06706 255 02179848 | 0-29446 55 0-54113 42 0-4981051 
30 0-24583790 | 0-06658 663 0-2144079 | 02984197 0-54766 97 0-49582 04 
31 0-24367 711 0-06612 926 02110297 | 03022338 0-55397 34 0-49363 66 
32 0-24161 750 0-06568 923 0-2078328 | 030591 67 0-56006 08 0-49154 60 
33 0-23965122 | 0-06526541 0-2048020 | 0-3094761 0-5659466 | 04895419 
34 0-23777 127 | 0-06485 680 0-2019234 | 0-3129197 0-57164 13 0-48761 80 
35 0-23597 135 0-06446 249 0-19918 49 0-31625 42 0-5771559 | 04857688 
36 0-23424 579 0-06408 161 0-1965759 | 0-3194858 0-58250 37 0-48398 95 
37 0-23258948 | 0-06371339 0-1940862 | 0-3226200 0-58768 62 0-48227 53 
38 0-23099780 | 0-06335710 0-1917075 0-32566 20 0-59272 11 048062 23 
39 0-22946 652 | 0-06301 212 0-1894318 | 03286171 0-59761 04 | 047902 66 
40 022799 182 0-06267 778 0-18725 20 0-33148 92 0-60236 66 0-47748 49 
41 0-22657020 | 0-06235 357 0-18516 15 0-33428 30 0-6069915 | 0-4759939 
42 0-22519 846 0-06203 894 0-18315 46 0-33700 20 0-6114941 | 04745508 
43 0-22387 366 0-06173 341 0-18122 58 0-33964 99 0-61587 96 0°47315 29 
44 0-22259 311 | 0-06143 652 0-17937 03 0-34222 99 0-62015 48 0-4717977 
45 0-22135 432 0-06114 786 0-1775836 | 0-3447454 0-62432 34 0-47048 31 
46 0-22015501 | 0-06086 702 0-17586 17 | 0-34719 89 0-62839 06 0-46920 68 
47 0-21899305 | 0-06059368 0-17420 05 0-34959 37 0-63235 93 046796 69 
48 0-21786649 | 0-06032745 0-17259 69 0-35193 19 0-63623 67 0-46676 17 
49 0-21677 350 | 0-06006 804 0-1710475 0-35421 62 0-64002 38 0-46558 94 
50 0-21571 241 | 0-05981 512 0-16954 94 0-35644 83 0-64372 80 0:46444 85 
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STATISTICAL TREATMENT OF CENSORED DATA 
PART I. FUNDAMENTAL FORMULAE 


By F. N. DAVID anv N. L. JOHNSON 
University College, London 


1. In recently published papers a distinction has been made between samples which are 
said to be ‘censored’ and those which are called ‘truncated’. In the former case the total 
number of observations is known but full information is not available in regard to some of 
them. In the latter case not only is information lacking in regard to part of the sample but 
the total number of observations is not known. 

In this paper censored data will be considered, and for ease of presentation we shall con- 
sider only samples censored on the right. This means that the only information about the 
missing observations is their total number and the fact that each is greater than some 
known value. 

2. The methods which we intend to develop are based on the use of ordered variates. 
This is a natural way of approach, since in a censored distribution the rank of each observa- 
tion available is known. 

In the present paper we are concerned primarily with a number of basic formulae which 
will be used in later work. Some applications are described, but they are of simple type and 
restricted to the subclass of censored distributions wherein the method of censoring is such 
that the number of omitted values is fixed in advance and not determined by the number 
of observations which happen to be greater than some predetermined value. This means 
that in repeated sampling we will always have the same ordered variates available. Such 
situations arise, for example, in life-testing, if survival times of the first & individuals to 
die out of N are observed, as opposed to the system wherein all survival times up to a fixed 
duration are observed. (Gupta (1952) has classified these as Type I and Type II censoring 
respectively.) 

Cases of the latter type, and cases involving censoring of a more complicated type, will 
be considered in later papers. 

3. It will be assumed that there is a population of measurements which can be described 
by a continuous random variable with probability density function f(t). We shall write 


x 
F(X) = [o feoae (1) 


Assume that a (complete) random sample of n observations has been drawn from this 
population. Let these observations, arranged in ascending order of magnitude, correspond 
to the random variables x,, 2, ...,z,,. Let X,, X¢,...,X, be defined by the equations 


F(X,) = r/(n+1) (r= | ere (2) 
We will then formally expand z, about X, in an inverse Taylor series, obtaining 


t= x, + X,h(x,) + $X7[h(x,)}? Hovey (3) 


ss — PF — 


~ a 8h cr 


i I od 


— 


F. N. Davip anp N. L. Jonnson 229 
where h(x,) = F(x,) —F(X,) = F(x,)—17/(n+ 1) 


, a ae »_ @X, dX 


-_ X= GP ~ dF \yex’ 2 ~ GF ~ dP lgex’ 


and so on. This is a modification of a method of approach devised by K. Pearson (1931). 

It will not always be legitimate to expand in this way. In particular, if nis large and r is 
near either n or-1, convergence may be slow or non-existent. However, for the cases con- 
sidered in this paper this expansion gives useful results. In the case of fourth-order cumu- 
lants we have given only the leading term in each expansion, and special calculations may 
sometimes be necessary to obtain an appreciation of their accuracy. 

4. The inverse Taylor expansion enables us to find approximations to the moments of 
the z,’s from the moments of the h(x,)’s, and the latter moments are known. For example, 
the joint probability density function of F(x,), F(x,), F(x, and F(z,) (r<s<t<v) is 





n! 
P(E, Fy, Fy Fy) = (r—1)!(s—r—1)! (t—s—1)! (v—t—1)! (n—v)! 


x FUE, — Byer (F,- F (A- Eye (l- By, 


where F. = F(z,) and so on. 
It follows that 








E( FF? FU Ft) 
7 mt (r+l—1)!(8+14+m—1)! (t+l+m+g—1)! (vt+l+mt+g+h—1)! 
~ (nt+l+m+g+h)! (r—1)! (¢+l-1)! (¢4+l+4m—1)! (v+l+m4+g—1)! 
In general 
g 9 es oi (r,+ z 1-1)| 
[7] - —a > si. (4) 


Central moments and cumulants derived from (4) are shown in Table 1. In this table it 
was found convenient to use the symbols p, = r/(n+ 1) and g, = 1—p,. 

Combining these results with (3) we obtain formal expansions for the moments of z,. 
Care is necessary to include all terms of appropriate order in the expansion. The variable 
form of the denominator in (4) leaves some choice in the form of expansion to use. The 
expansion may be in powers of n-! or (n + 1)~1, or (n+ 2)-* and so on. In Table 2 we have 
chosen expansion in powers of (n + 2)~! for the following reasons: 

(i) It has the simplest form for the first two moments. 
(ii) It gives the closest approximation to the exact results of Hojo (1931) for the expected 
value of quartiles from Normal populations. 

(iii) It gives exact results for the first two moments of any ordered variate when the 
population is rectangular. 

These are not, of course, at all conclusive arguments, and in many cases some ‘other 
function of n may give a more quickly convergent expansion. 

5. We will now digress to give a few interesting results obtainable from Table 2. This 
table gives approximate expressions for the moments and product-moments of ordered 
variates in samples from any population for which there is a probability-density function 
which is differentiable as many times as desired. Some special cases will now be considered. 
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(i) Median. Assuming n is odd and writing m = $(n+1), we have, to the appropriate 
order in (n+ 2)-1, 








E(2,) = Xn tae X"4 inate Tat xn, 
Kh) = ara 8+ sa OXmXnt Xe 
Kh) = rag 8X 
Kh) = iggy OXAXS+ LAX —OX~), 
oe Mem) = Feely — Tw AE 
Men) = ate GH (Et FE): 


Using the inverse Edgeworth (or Cornish-Fisher, 1937) expansion, formal expansions for 
percentage points of the distribution of the median may be obtained. Thus, if x, , be the 
standardized upper 100« % point of the distribution of the median, i.e. 


P {Bq — E (qq) > in, alK (Xin) ]*} = 0, 
then, to order (n+ 2)-} 














een Ll Sas 1 Xn 3X72 
ma = Nat Tin ¥d) Xt, Me 1) + Fn +2) l2(z2-8) (Az — 3A,) xa Aq}; 
where A, is the upper 100« % point of the unit Normal distribution. 
Results for four special forms of distribution are summarized below. 
Probability 
Name density Meriiitem incom k(x2,) ing Ag 
function 
Normal Pa e-i* | (2m)t| 0 | (27/8 Cuey af SS ge —3A,) 
J(2n) 2n+2) 4(n+2)2 12(n4+2)“*" “a 
Rectangular | 1(0<z<1)/ 1 0 | o }(n+2)-2 —}(n+2)-1(A3—3A,) 
Cauchy m-*(1 + 22)-1 1 0 2n* | 407(n+2)-* — ¥x(7* — 3) (n+ 2)-1 (A%, — 3A,) 
+ }m*(n + 2)-* 
Exponential | e-* (2>0) 2 4 16 | (n+2)-2+#(n+2)-* | $(n+2)-#(A2—1) 
+x(n + 2)-1(A3, — 6A,) 


























(As noted above, the moments for the rectangular distribution are exact.) 


(ii) Correlation between ordered variates. To order (n+ 2)-! the formal expression for the 
correlation between x, and x, (* <8) is 


ah (pa) [awe RP Rr Pret Hin Pol 
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The following table compares values of o(r,s) for a Normal population calculated from the 
above approximation with exact values given by Godwin (1949). 


Values of p(r, 8) 





























n r 8 Approx. Exact 
3 1 3 0-2958 0-2947 
4 1 4 0-2123 0-2129 
2 3 0-6571 0-6546 
9 1 9 0-0842 0-0869 
2 8 0-2295 0-2291 
3 7 0-4153 0-4144 
4 6 0-6615 0-6606 











(iii) Quartiles and interquartile distance. The upper quartile will be defined as x,,, where 
for u = }(n+ 1) and the lower quartile as x,, where 1 = }(n +1), so that p, = 2, p, = }. For both 
the u and / to be integers it is clear that (n+ 1) must be a multiple of four. For a unit Normal 
population we find, to order (n + 2)-%, 











and 


var (z,,) = 


0-62617 0-96810 1-00287 
EG(a,,) = 0-67449 ~ 5 
(x,,) +53 *taat 42) 
1-85677 -35871 





n+2 


3-42763 3 5 
(n+2)?  (n+2)?° 


Comparisons with Hojo’s exact values are shown below. 









































Expectation Variance 
ateeitell ae 
n Approx. Exact Approx. Exact 
TTT 7 0-75739 0-75737 0-25598 0-25673 
11 0-72885 0-72885 0-16555 0-16571 
For the exponential population we have, to order (n + 2)-%, 
3A,) 
i é(x,,) =] 44 : + vi a : 
°(a,,) = log aT ; ‘ 
(® Ge 2(n+2) 4(n+2)? (n+2)8 
6A.) dois 2 25 8 





, 6(%) = lo8e3 + Gin) * T08(n + 28 * 2m 4D) 
When n = 11 we find that these formulae give &(x,,) = 151977, &(2,) = 0-30201, as com- 
the pared with exact values 1-51988 and 0-30202 respectively. 

The expected value of the interquartile distance (J = x,,—2,) is, of course, &(x,,) — &(%). 
The variance and higher moments of J are obtained by straightforward application of results 
in Table 2. 
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6. We will now consider the problem of testing certain simple hypotheses, restricting 
ourselves to data censored on the right. We will suppose that the lowest k of arandom sample 
of n observations are recorded. Since we are assuming that the lowest k observations 
are always available we can refer them to the distributions for which we have obtained 
formal expansions for moments and product-moments. In the case of a simple hypothesis 
we can make use of the probability-integral transformation to reduce our problem to that 
of dealing with a rectangular population with consequent simplification of many of our 
formulae. 

Let the available observations in ascending order be t,, f,, ..., t;,, and denote by f(t) the 
common probability-density function specified by the simple hypothesis H. Then we can 
apply the probability integral transformation 


x, = [iso dt. 


If H is valid then the variates x,,...,xz, will be distributed as ordered variates from the 
rectangular population p(x) = 1 (0<2<1). 

If the alternative hypotheses simply specify a change in a pure location parameter, a 
natural criterion to use (if available) is the median, t,,, of the original observations. In many 
cases this might well be preferable to using z,,, but z,, does possess the advantage of 
generality. 

When n is odd the upper 100a % point z,, , of the distribution of z,, is given, in terms of 
the Incomplete beta-function ratio, by the equation 


I (4(n +1), $(n+1)) = l—a. 


2m, a 


A few values of z,, , are given below. Further values can be obtained as required from the 
tables of C. M. Thompson (1941) 


n a= 0-05 a=0-01 
5 0-811 (0-812) 0-894 (0-921) 
15 0-700 (0-700) 0:767 (0-777) 
25 0-659 (0-658) 0-719 (0-721) 
Large (1+ 1-645(n + 2)-4) 4(1 + 2-326(n + 2)-4) 


The values in brackets were calculated from the formula for 2, , given in § 5. 
When n is even, the statistic Z,, = }(24(,_1) +24») May be used. An approximate equation 
for the upper 100a % point Z,, , is 


7 a 1 +n-, $n + 1 +n-) = 1 — @. 


7. It will be recognized that the foregoing analysis is not comprehensive. It omits con- 
sideration of the situation arising if the median is not among the observed values, and does 
not take into account explicitly any observed values except the median. 

In the notation of §3 we have X, = r/(n+1) = &(z,) if the simple hypothesis H is valid. 
A change in the pure location parameter will tend to make the quantities &(x,—X,) all 
positive or all negative. We will therefore consider 


%, = k-} > (x,—X,) =k ¥ 2,-Hn+1)(k+1) 


i=1 t=1 
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as a possible criterion. We note that Z,, has a finite range of distribution, from — ae 
1- sey When H is valid the moments of %, are given by the following equations: 
&(Z,) = 0, 
k k 
x(a) = k(n +2)1| & get 2ZTPG| 
i= i<j 
k k 
= k-*(n+1)-2 (n+ aly i(n—i +1) +25 Di(n—j+ | 
i=1 


i<j 











k+1 
= 1ST ey n+ YD) (Bk + 1)—3h(k+ 1), 
_ (k+ 1)? (n—k+1) (n—k) 
«(@) = Seem +1)*(n+2)(m +3)’ 
” k+1 
«(Z) = 120k3(n + ars (n +3) (n+ 4) [2A(o+ 1)? (26+ 1) (SRP 9k ~ 1) 


— 40(n + 1)? k(k + 1) (13k? + 13k + 1) + 300(n + 1) k* (kh + 1)? (2h 4+ 1) 
— 225k3(k + 1)3—5(m +2) k(ke + 1) {2(n + 1) (2k +1) — 3k(k + 1)}?}. 

The evaluation of the above results involved the calculation of certain product sums of 
the first k natural numbers. Some of the results used were new to us in that wé were not 
acquainted with any publication containing them. A summary of the results is appended in 
Table 3, and we shall use them in further work. 

8. The distribution of %,, is remarkably skew when k is small compared with n. Values of 
8, and £, for the case n = 10 are shown in the following table: 

k= 1 2 3 + 5 é : 8 9 10 

fB, = 2:30 1-22 0-70 0-41 0-24 0-13 0-06 0-02 0-00 0-00 

f, = 5°78 4-47 3°75 3°34 3-09 2-95 2-87 2-84 2-85 2-88 
In view of the finite range of variation of %, it was thought reasonable to consider using a 
Pearson Type I curve as an approximation to its distribution. Fitting a Type I curve to 
give the correct range of variation, mean and standard deviation, gave rather unsatis- 
factory results in that the third and fourth central moments of the fitted curve were much 
less than those of %,. Type I curves having correct values for the first four moments were . 
therefore used, values of the standardized deviation of the upper and lower 5% points 
being obtained from the tables of Pearson & Merrington (1951). Values were also obtained 
using the Cornish-Fisher formula. The figures were in reasonable agreement with each other, 
though they cannot be relied upon to be sufficiently accurate for purposes of tabulation. 
They do, however, provide a standard of comparison for the approximation developed in 
the next section. 


9. Consider the variable 
k 
Ye = Be tH(n+ 1) (k+1) =k & 4, 
i=1 
The range of variation of y;, is from 0 to 1: 


k+1 
E(Yx) = ie a i 
var (y,) = $(n +2) B(2+k-—3P,) 
= 4n-P,(2— 3P,) Fi, 
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where F’: is a correcting factor which approaches unity as n and k increase (see table at end 
of this section). 
Since Gly,) =P, and var(y,)+=4n1P,(2-— 3), 


an appropriate variance equalizing transformation is 


Y, = sin! it = 4c0s-1(1—3y,) (0<Y¥,< 47). 


Approximately &(Y,,) = 4cos-! (1—3P,) 
and var (Y;,,) = 3n-1. 


We note that if y,, > 3 no corresponding value of Y;, can be obtained. It is to be expected, 
therefore, that the transformation will be useful only if P{y, > %} is small. Since 


E(y;,) = P<}, 
this will be the case for sufficiently large n. 

Assuming Y, to be normally distributed with mean and variance }cos-!(1—3P,) and 
in respectively, we obtain the following upper and lower 5% points of %,. Values of 
F,, and approximate upper and lower 5 % points obtained from the Cornish-Fisher expansion 
and from the fitted Type I are also shown in this table. 

The use of the transformation as a basis for a quick significance test requires some further 
study in points of detail, but it appears likely to give reasonably useful practical results 
if n is bigger than 20 and | kn-1—}| < 0-3. 


Percentage points of X;, 























| | | 
| Type I Expansion | Transformation 
heal hey | F, 
| 
Lower 5% |Upper 5%| Lower 5% |Upper 5%) Lower 5% [Upper 5% 
| 
[|_| —}$ |_| | vans eal ~ 
10 | 2 —0-113t 0-171 | —0118 0-174 | —0108 | 0-160 | 1-047 
4 — 0-142 0-182 — 0-146 0-182 —9143 | O-171 | 0-996 
6 — 0-160 0-179 | —0-160 0-181 | —0-163 | 0-168 |! 0-983 
8 — 0-161 0-169 | —0-162 0-171 | —O171 0-151 | 0-984 
ME srorsme wisccrscees ae y ae ee 1 oe ta ei cee 
20 | 4 — 0-079 0-109 | -—0-079 0-109 | 0-077 0-106 | 1-024 
8 — 0-103 0-124 | —0-103 0-123 — 0-104 0-120 0-997 
12 —0-115 0-127 —0-115 0-127 —0-118 0-121 0-990 
16 — 0-118 0-119 —0-117 0-120 — 0-122 0-112 | 0-991 











t Obtained by extrapolation from Pearson & Merrington’s tables. 


10. The test criterion discussed in the previous sections is evidently not the most sensitive 
possible, even for roe a change in a pure location parameter. 3 would be natural to 


use a weighted sum 3 a,(z;— X,), the weights being chosen to make 5 a,x, the best linear 
i=1 


unbiased estimator of 4 when the null hypothesis is valid. Gupta (1952, pp. 265-7) has 
calculated the coefficients for an analogous statistic, ~*, in the case of the Normal distribu- 
tion. This statistic was put forward by the author as an estinrator of the mean of an unknown 
Normal population, given a censored sample, but it could be used to test the hypothesis we 











ee 
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are considering. It is therefore of some interest to compare the sensitivity of tests based 
on “* and on Z,, respectively, when applied to testing the mean value of a Normal distribu- 
tion with unit variance. Here we will attempt only a very rough local comparison of the two 
criteria, based on the ‘limiting non-centrality rate’ defined as 


d€ (criterion) IT deed eee 





é (location parameter) of criterion 





null 
hypothesis 


For y* this is equal to [var (u*)]-*. For %, we use the approximation 

















0€ (%;,) me s Z, ~—Piti 
0 (location parameter) *k;=, \* 2(n+2)Z,)’ 
1 
where Z,= +7} 
Bat... eee 
d 1 (™ was asf 
” rite ee ed 
The values obtained when n = 10 are compared in the following table: 
k | (1) | (2) Ratio (2) :(1) 
| Z | ye i : 
pe ee Bere. Aline Si diicsiniall 
2 | 2-266 | 0-942 0-416 
4 2-722 2-056 0-755 
6 | 2-975 2-736 0-920 
8 3-129 3-050 0-975 
10 3-211 | 3-162 0-985 














The comparison is, of course, very rough and the results for k = 10 appear anomalous, 
but there does seem to be some evidence in favour of %,, at any rate for the smaller values 
of k. As compared with u*, %, requires the additional labour of applying the probability 
integral transformation, but it does not require a special table of weights for its calculation, 
and the approximation in § 9 will often enable a rapid test of significance to be made. 

11. The main purpose of the present paper has been to present formulae which it is 
intended to use in future work. The new applications discussed have been selected for 
illustrative purposes. Further subjects which it is hoped to discuss later are: 

(a) the development of tests involving polynomials in the functions (x; — X;) of the latter 
part of this paper, 

(6) effects of non-normality on statistical procedures based on (censored) Normal 
distributions, 

(c) comparison of two or more censored samples, 

(d) extension of these methods to samples censored by fixed point(s) (Gupta’s method I) 
and to truncated samples. 
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Table 1. Cumulants of F(x,) 


Note. (i) r<s<t<v. 


(ii) For convenience we use a notation typified by the relationship 


Kil F,, F,, F,) = K(ret?). 


























aa S tem bh Biles 
K(r) = Drs 
K(r*) = Be, K(rs) = Be 
K(r3) mi 2PrFlGe — Pr). k(r2s) pa 27,9s( Vr — Pr) 
(n+2)(n+3)’ (n+ 2)(n+3)’ 
k(re?) = 2P,IIe— Ps) | nile 2p(ds— Ps) U 
(n+ 2) (n+3)’ ~ (n+ 2)(n+3)’ 
* 6,9 a <p 2 ee 
K(r*) = (n+2)(n+3)(n44) | (a Pr) nes pete |, 
< 6, iain SO 
k(7°8) = (n42)(n+3)(n44) [ Pr) meet | 
67,9. n+3 
3 -" any cuaneceneat 
k(rs*) = (n+2)(n4+3)n44) [ P.)* nee Pats 
202 i Sack Ge ~~ 4-33 ie pare at) ea! 
k(7*8*) = (n+ 2) (n+3)(n+4) [ 6 Pr) (Ys— Ps) + +2 WrPs 
ae Prd Ss i 2(n + 1) = 
k(r’st) = (nt2)(n+3)(n44) [ Pr) (Vs—Ps) + man "Pe 
6p,% n+3 
2. _ => casas 
K(rs*t) = (n+ 2) (n+ 3)(n+4) G Ps)” ne Pate | 
ke Pre 2(n+1) 4(2n + 5) 
K(rst*) = (n+2)(n+3)(n 44) 6(q.— Ps) (%— Pt) + > ee 
( wiv geee oes | * ” 2(n+ 1) ~ 
K(rstv) = (n+ 2)(n+3)(n+4) [6 Ps) (U%— Pe) + nag 4h 


Table 2. Cumulants of ordered variables (to order (n + 2)-*) 


Note. (i) r<a<t<v. 








(ii) For convenience we use a notation typified by the relationship 


Pr = 1—gq, 


&(2,) = X,+ PrUr X%+ PrWr 


2 
Ky19(Vps Vey Xz) = K(XpX, Xz). 


y Xp 
=—_, dF = ‘ ‘= r aF. 
ee a (2) =p; XX, = GX,/ 





2(n + 2) (n+ 2) 











PrUr m 
~ ) Xe + 
+ int 2) ([—4(9,— Pr) #{(q 
2 PrQr , PrUr 
felt SEF — X¢ , 
K(x;) = n+2 tin+op? 
PrUr 
(n+ 2) 


+299, — Pr) (EX; Ae + 3X7 aa +4 } p24? oem v4 ob 2X¢ XV + §X?'2)], 


— [—2(9,—p,) X¢ Xe + {(G-— 


[34+ — Pr) Xp +32-9rXP] 


(Ge — Pr) Xp Xe + Dp Ge( Xp Xe + 4$X7%)] 


— PrQp} (2X_Xy + $X7*) 


— P74} XP + 34, PAG — Pr) X7 + dep? gs Xe'], 


i 
| 
, 








TT 





K(2, Zs) = 


K(x?2,) = 


lap a8) — 
K(XpX5) = 


K (ayy) = 
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Table 2 (cont.) 











P. qs aie a7 PrQs awe wr Wy 7 - Ul a 
= gaeXs + (n ss 2)2 [(9r— Pr) X; X,+ (Ge— Ps) XX; +49,9,X¢ Xi+ 47.9,.X;X7 +42-9,.X7 xX?) 
PrYs Bis > $4 re xX’ 2 Pid xX? 
(n+ 2)l— (&e—Pr) + Xa (Qs—Ds) XeXe + {Gr — Pr)? — Per} Xe Xs 


+{(4s— Ps)? — 09s} X¢-Xs +{HGr—Pr) (Gs—Ps) + 400% — 2 Gah Xe Xe 

+ $ Pr 9r( Gp — Pr) XP X45 + $P.9s(Ve— Ps) X¢-Xs + {Pr Is\Gr — Pr) + $PrIe(Gs— Ps)} Xe Xe 
+{PrIQs— Ps) +4259 Gr — Pr)} Xz Xe + bpege Xs Xi+ hpige XX; 
+4p79,9,.X7 Xo +40,p.92 XrXe + e(2p?gs + 39,9, 259s) Xr Xe] 





Prd , rey" 

n+ 2) [2(¢, — Pr) Xx? a 39,9, X7?X 7] 
Prd , , ” 720 ower 
(n+ 291 2% —Pr) XF? + 9{(9, — Pr)? — Pr Gr} X72 X47 + 3p,9-(9,— Dr) (BX 2X, +4X,XF?) 


+ p2g?(3Xs2.X¥ + 6X4,X7 X71 + X73)], 








Prd 7 7 , 4 , ; 44 
/ : 5 (21% — Pr) Xp? XG + 2p, g, Xe Xe X5+p,9,Xe" Xs] 
(n+ 2) 
Prs , ‘aa 2 2 7 ” , 
+ (n +2)8 [ _ 2(dr — Pr) XP Xe+ 6{(q, — Pr)* —PrQr} XX? Xs 


+ {3(9,— Pp) (Qs— Ps) + Ue Ps— 49s} Xp? X5 + Wy Tel Ge — Pr) (BX¢X eX + 2Xy?X5) 
+ {2p,95(9s —P,) +7595 = Pr)} XPxs 7 {6p,94(9 —Pr) + 20, 9r(Ys —Ps)} X,X; Xe 
+72, 9 2X; Xp Xz +X2X3) + pegl2X; Xp X,+X;X,X7) 

+(PFQS+ PrGr Pes) XeX eX + bP P eG Xe? Xs), 








PrQs rr) ye yr ye wy 
(n . 2)2 [2(¢,— Ds) Xe Xe + 2p.q,X-X, Xs +P9sX¢ X3*) 
P qs “aa ¢ " rr Wws ” a” 7 
Gn ght Ade Ps) Xr XV? + 6(G.— Ps)? — Pode) XXX; + {3(9, — Dr) (Ys— Ps) + PsQr — 4Pr Ge} X X?* 


+ 2.96(s— Ps) (3X;-X, Xs + 2X, Xf?) + {2p,95(G — Pr) + Pr Ils — Ps)} Xe XP 
+ {6p,9s(9s— Ds) + 2059s %- — Pr)} Xp XE XS + Pp PsGe(2X7XGXy + X7Xs*) 
+ pi q3(2X_X5Xy + Xp, XEXY) + (pg + Pde Peds) Xr XoXs +h IGeXr Xs"), 


Prd. , , re “wh "aa , ” 7 , re ” 
in $82 Ps) XXX t+ Pee X 7, XG Xt + PoGeXe Xs Xi + PsUXrAX7] 


PrY 9 riwry “aww 
n+ 2)° [ oe 2(¢5 —P,) Xx rXe2 t+ {3(¢, —P,) (Ys — Ps) + 9pPs— 4p,Qs} xX; Xi Xt 


+ 3{(q,—Ps)* — P59} X- Xs Xt + {3(Ge— Ps) (Ge — Pt) + UePt — 46%} Xe AXGXG 

+ {20,96 Gp — Pr) + PrGe(Gs— Ps)} Xp Xi Xt + 3,96 Ie — Ps) Xp Xe Xt 

+ {2,9 Ge— Pe) + P29 Gs — Ps)} Xe XEXe + {3M-GGs— Ps) + Pe Vel Gr — Pr)} Xe Xs Xt 

+{PrQs(Ue— Pt) + Ps Qe Gr — Pr) + 2Pr We — Ps)} Xe XEXF + {eG Ue — Pt) + 3.9 Gs — Ps)} X--Xe Xe 
+3779,9.X¢ Xi Xi+ dpi gi X, XP Xi + 4 p.49% Xe XAT + Pes + PrG Ps) Xe Xe Xe 

+4 P?90%+ PrQr Po) Xp XXs +P, PG Xz Xy Xi + VGA Xe XF 

+3 PpPeGe + PrVePeG) Xt Xe Xe + H VIG + PoGsPe%) Xe Xs Xt + Pp PoVeUXe Xs XF, 


~ 
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k(x}) = - 


k(x>2,) = 


k(a}23) = 


k(a,22) = 


K(x¢x,2,) = 


2 
K(a,aha,) = 


2 
K(X,@,X4) = 


K(x, LyX, Ly) = 
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Table 2 (cont.) 


a 2 ah [6{(q,—P,)*— p,9,} X44 + 24p,4¢(4e — P,) XI X72 + 4p2q XP X" + 3X/2Xr)], 








P. q , 7 4 a 4 / wn” 
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Table 3. Sums of powers and products of the first k natural nwmbers 
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SIMPLIFIED DECISION FUNCTIONS 


By G. A. BARNARD 
Imperial College 


The recent work of Wald (1950) and Lindley (1953) in the theory of statistical decision 
functions is of the utmost importance to practising statisticians, but it seems at present 
rather inaccessible to these latter because of the formidable mathematical apparatus with 
which it appears to be connected. Our object in this paper is to give a reasonably simple 
proof of a result closely related to that proved by Lindley, and indirectly related to Wald’s 
central theorems. Our result can also be regarded as a generalization of Neyman & Pearson’s 
fundamental lemma (1933), and it thus forms a bridge from their work to the later work of 
Wald and Lindley. We arrive at our result by means of two lemmas, one geometrical and 
one statistical. The geometrical lemma is a purely mathematical result, and its proof would 
not have been given in this paper had it been longer or more complicated than it is. The 
reader who wishes to concentrate on the statistical argument can omit the section devoted 
to this lemma. The statistical lemma, on the other hand, is of some interest in itself as giving 
a direct frequency interpretation of a likelihood ratio. 

Our result on decision functions is essentially a slight extension of that first derived by 
Lindley, but for simplicity we have restricted ourselves to the case where only two decisions 
are possible. In this respect our results could be extended at the expense of tedious but 
inessential complication. Our methods of proof, too, have been derived from those of 
Lindley, by omission of these complications, by use of our geometrical lemma, and by 
simplification caused by our adoption from the beginning of the notion of a randomized 
decision function. Lindley went to some trouble to avoid randomized decisions as much 
as possible (they cannot be altogether avoided), in what now seems to the present writer 
a misguided endeavour. If we confuse the decision problem with the inference problem, 
randomization appears as a thing to be avoided. It is unreasonable that the conclusion 
to be drawn from a set of data should depend on the totally irrelevant throw of a coin; 
but it ts not. unreasonable that the action to be taken on the basis of certain facts should 
be decided by the throw of a coin, if the pros and cons are very evenly balanced. 


A FORMULATION OF PROBABILITY THEORY 

In any work on general probability theory one is harassed by the difficulty that everything 
has to be said twice, once for discrete and once for continuous variates. One might say, 
three times, if one adds the multivariate case. It has for some time been customary to bring 
the first two cases together by using the cumulative distribution function instead of the 
probability (density) function; but this, besides being of no particular help in the multi- 
variate case, is excessively awkward when problems of inference or decision are involved. 
We therefore propose another way out, by a reinterpretation of existing symbolism. 

We propose to take as our general expression for the probability of a proposition A 
depending on a single trial 


Pr{4| 9} = { A(@) p(a)de, 
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where x is a variable ranging over the set of values possible for the variate X under dis- 
cussion, ¢(x) is a continuous function of x, called the probability function, and A(z) is 
a function of x defined by saying that A(x) is 1 when A is true and is 0 when A is false. The 
meaning of the pair of symbols { 

.- 2 


depends on the nature of the range of x. If x is a single real variable, these symbols have 
their ordinary meaning (the range being from —oo to +00). If x is restricted to being 
a positive whole number, they are to be taken as meaning 


be 
hed ones 


while if, for example, x represents a point (x,, #2, 3) in continuous three-dimensional space, 
they are to be taken as meaning 
{ | ‘abt dite, 


If the truth of a proposition B depends on the results of two (independent) observations on 
X, its ‘characteristic function’ B(x, y) will be a function of two variables x, y, corresponding 
to the results of the two trials; B(x, y) will again, by definition, take the value 1 when B is 
true and 0 when Bis false. For the probability of such a proposition we shall need to interpret 


| ... dxdy. 


We do this in the obvious manner. If, for example, 2 is the point (x,, 22,23), while y is the 
point (¥;, Y2, ¥3), then we take 


[[--aedy to mean {ATT ... dx, dx,dx,dy, dy,dyz, 


and similarly in other cases. We then have 


Pr{B| ¢} = | | B(x, y) $2) $y) dedy, 


since, by the multiplication law of probability, the probability function for two independent 
trials (x, y) is the product ¢(x) ¢(y) of the probability functions for each trial separately. 

For the mathematically minded reader we can sum this up by saying that, in accordance 
with modern practice, we are taking 


] dx 
to represent a Stieltjes integral with respect to a measure supposed defined in the space of X. 
We depart from modern practice among probabilists by insisting that in all cases arising 
in practice this space may be taken to be a topological space, and that ¢(x) can be taken 
continuous in this topology, while the given measure is defined and non-null for all non- 
empty open sets. The consideration of more general cases, we would argue, belongs to pure 
mathematics, not to statistics as a branch of applied mathematics. Perhaps the principal 
logical advantage of our approach is based on the fact that for us the probability function 
is defined everywhere, whereas for the more general probabilistic approach the probability 
function is defined only almost everywhere. 
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A GEOMETRICAL LEMMA 


In this section (not in other sections) x = (2,, Z», ...,2,,) will be taken to represent a point in 
n-dimensional vector space. If x,y are points in this space, and a and f are real numbers, 
the point ax+fy is that with co-ordinates (ax, + fy,,ar,+ Py2,...,0x,+fy,), and the 
‘inner product’ x.y denotes the real number 


LAY = UzYy + LoYot --- +LpYp- 


We call the positive aliquant (cf. quadrant, octant) the set of all points none of whose 
co-ordinates are negative, and the negative aliquant the set of all points none of whose 
co-ordinates are positive. These aliquants are not supposed to contain the origin (0, 0, ..., 0). 
A set of points C is said to be convex if, whenever it contains the points z and y, it contains 
all points on the line segment joining «x to y, i.e. all points 0x + (1—6) y with 0<@<1. 

A set C is said to ‘have the property P’ if (a) C is convex, (6) no point of C is interior to 
the negative aliquant. We now prove the 


Lemma. If C has the property P, there exists a point u in the positive aliquant such that 


u.x>0 
for all x in C. 


Proof. We use induction on the dimension number n. Assume first that n = 2 and consider 
the points (cos a,sina@) on the unit circle in the first quadrant, 0<a<47. We put a into 
class L if there is an x in C in the second quadrant such that x, cosa+2,sina <0, and we 
put a into class U if there is an x in C in the fourth quadrant such that x, cosa +2,sina< 0. 
Then if a is in L and / is in U, we must have «< . For otherwise, if x is a point of C which 
shows a to be in L, and y is a point of C which shows f to be in U, the segment joining x to 
y passes through the interior of the third quadrant, contra hypothesis. Further, since 
2, COS % +2, 8in a is a continuous function of «, the sets Z and U are both open, i.e. L has no 
greatest member and U has no least. It follows that there must be at least one a in neither 
class, and for this « we have 

2%, COs a+2,s8in a >0 


for all x in C in the second, fourth and first quadrants, i.e. for all x in C. 


Now assume the lemma true for dimensions 2, 3, ..., (#7 — 1), and suppose C to be a set in 
n dimensions. We define an (n—1)-dimensional set C- by saying that (2, 2p, ...,%,~,) 
belongs to C- if and only if, for some z,, < 0, (2, Xq, ..., Z_1,L,) belongs to C. It is easily seen 


that C- has the property P, so that there exist, by the inductive hypothesis, v,, v9, ..., Y,_1; 
non-negative and not all zero, such that 


U4 Xy + Vg%qt ... +Un_1%n_1 2 9 


for all (2, 2g, ...,%,_,) in C-. Thus if v denotes the point (v,, v2, ...,U,_,, 0), we have for all 


s°9 “n—1) 
points x in C, either z,>0, or v.x>0, or both. Thus the two-dimensional set of points 
(v.x,2,), where x ranges over C, satisfies condition (6). It is easily seen to be convex, and so, 
by the first part, there exist non-negative w,, w,, not both 0, such that for all x in C, 


W,(v.%) + Wx, > 9. 


Thus we can here take u to be the point (w,v,, ...,W, 0,1, W2) and the lemma is proved. 


16-2 
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A STATISTICAL LEMMA 


Now let us consider situations where we are able to make independent observations on 
a variate X whose probability function ¢ is knewn to be one of a finite number of functions 
po; $1, ---» Py» We wish to decide whether or not to take a certain action A which is desirable 
if dg = bo, but which is undesirable if ¢ = ¢; for i = 1,2,...,4. In Neyman & Pearson’s 
terminology we might be said to be ‘testing the hhypotheais® @ = %» against the set of 
alternatives ¢ = ¢,, and action A would then correspond to ‘acceptance’ of the hypothesis 
being tested. 

Any rule for making the decision will consist of two parts: 

(i) a rule for deciding when to stop taking observations, 

(ii) a rule for deciding whether or not to take action A, given that observations have ceased. 

We call (i) the sampling rule, and (ii) the (final) decision rule. In making decisions in 
practical life we often find the pros and cons so evenly balanced that the only way of arriving 
at a definite decision is to ‘toss for it’. This means that we allow our decision to depend not 
only on the relevant data, but also on the result of an irrelevant random experiment. We 
allow for this possibility in our theory, partly because of its occurrence in practical cases, 
but mainly for its mathematical convenience. With this in mind we can say that a sampling 
rule will be given by a sequence of functions 8,, 89, ..., 8,,...,8,, Such that 


(i) s, is a function of r variables (2,, x2, ..., 2,) such that 0 <8,(x,, 2%, ...,%,) < 1 for all x; and 
pas f, 3... 
(ii) 8,,(% 1, %q, ...,%,) = 1 for all z;. 


8,(2%1,Xq,...,%,) is to be interpreted as the probability, given that r observations have 
been made, with results x,, 2, ...,%,, that observations will then cease. Thus condition (ii) 
implies that observations will certainly cease at the nth observation, if not before, so that 
n is an upper bound to the sample size, which exists in all practical cases. Our argument is 
made slightly simpler by assuming that n is finite. 

The probability that the total number of observations taken is r is the probability 
function ¢(x,) 6(%_) ... d(z,) for the sequence 2,, 72, ...,2,, multiplied by the probability of 
stopping observations at this point, integrated over all possible sequences of length 7, i.e. 


{f--[a — 8,) (1—8,) ... (1—8,_1) 8, B(x,) ... f(x,) dx, ... dx 


Turning now to final decisions, we formally define: 

A final decision rule is a sequence of functions d,, d, ...,d,, ..., d,,, such that for all sequences 
of values of x, 0<d,(x,, 2%, ...,%,) <1 for r = 1,2,...,” 

Such a rule has the interpretation that, if the values 2,, x, ..., x, have been observed for 
X, and if the sampling rule has given the result that observations are to cease, then a random 
experiment is to be performed in which the probability of ‘success’ is d,(2,, 2, ...,%,), and 
if ‘success’ is obtained in the experiment, then action A is to be taken, and not otherwise. 
We then have, for the probability of action A, the total probability. of taking action A after 
one observation, after two observations, ..., after n observations: 


Pr{A |g} = favs P(x) dx, +f d4(1 — 8) 89 9(%1) O(a) dx, dx, + 


+{f--fa n(1 — 81) (1 — 89) ... (1 8n_1) 8_ P(X) B(Xq) -.. P(%p) dX, dx, ... dx, 
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We now concentrate attention on the ‘hypothesis being tested’, ¢), and one of the 
alternatives ¢;. Following Fisher we define the likelihood ratio for the sequence of observa- 
tions X, 2%», ...,%, to be 

Lip(Xy,Xq, ---5Ly) = Py(Xy) P;(%q) ... Hi(X,)/Pol%1) Pole) --- Po(X,), 
and adapting Neyman & Pearson we define the 
Probability of wrongly taking action A, when ¢ = ¢, 
Probability of rightly taking action A, when ¢ = ¢y 
= Pr{A | ,}/Pr{A | do}; 
or, in the customary notation of ‘errors of the first and second kinds’, £/(1—«). With these 
definitions our statistical lemma can be stated: 


Odds of error = 





Odds of error = Conditional mean value of likelihood ratio, given action A correctly taken. 


The proof is immediate. Both sides can be seen to be equal to 
= ff [lean —8,) (1-8)... (1—8,_1) 8, Do(%;) --. Do(%,) da, ... da, 


n 
E{f-fac —8,) (1—8,) ... (1 —8,_1) 8,P9(%,) ... Bo(%,) dx, ... da, 
since lig hols) --- Bolt) = Gels) --. $i(@,)- 

Although it is so simple, this lemma has many applications. For example, if the decision 
rule is such that the likelihood ratio is always less than a fixed number LZ whenever action 
A is taken, the conditional mean value must be less than J, and so must the odds of error be. 
This is a slight sharpening of an inequality due to C. A. B. Smith. It provides a justification 
for ‘equating’ the likelihood ratio to the odds of error. If action A consists in asserting that 
¢ is ‘moderately likely’, and we say this when the likelihood ratio lies between 1/20 and 
1/10, then in the long run the ratio of the relative frequency of saying ¢, is moderately likely 
when it is false (and ¢; is true), to the relative frequency of saying ¢, is moderately likely 
when it is true, lies between 1/20 and 1/10. The special interest of this frequency interpreta- 
tion of the likelihood ratio lies in the fact that, like the likelihood ratio itself, it is independent 
of the ‘reference set’. 





> 


If we now turn from considering one alternative hypothesis ¢; to the whole set of 
alternatives i = 1, 2,...,k, it is convenient to introduce a set of non-negative weights w,, 
such that } w; = 1, and then to define the weighted odds of error as 


Dw; Pr{A | $,}/Pr{A | do}, 


and the weighted likelihood ratioas 1, = > w,l;,. 
i 


Then since the mean value of a weighted mean is the weighted mean of the mean values, we 
have that the weighted odds of error will be equal to the conditional mean value of the 
weighted likelihood ratio. Then, as a direct application of this result, we may mention that 
if we arrange the decision rule so that action A is taken only when the weighted likelihood 
ratio is less than a fixed number L, the weighted odds of error will then be less than L. In 
particular, since a weighted mean of a set of quantities lies between the highest and lowest 
of these quantities, it follows that if the probabilities of error are all equal, they will all be 
less than L. 
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ADMISSIBLE DECISION RULES AND BAYES RULES 


If we now adopt Neyman & Pearson’s terminology, and allow the action A to correspond to 
acceptance of the hypothesis ¢,, then for any given decision rule the graph of the odds of 
error for ¢; against the suffix i will represent what they call the power curve of the decision 
rule—inverted, and scaled up in the ratio 1:Pr{A | do} = 1:(1—«), in the usual notation. 
In this section we shall follow Neyman & Pearson in considering that the sampling rule is 
fixed, and the probability of rightly accepting ¢5, Pr {A | ¢o}, is also fixed at a value p, and 
then we compare different decision rules on the basis of their power curves. 

If the power curve for a decision rule D never lies below the power curve for a decision 
rule D’, and sometimes lies above it, we say that the decision rule D is uniformly more 
powerful than D’, or, in Lindley’s terms, D is uniformly ‘better’ than D’. And we say that 
Dis admissible if no rule exists which is uniformly better, or more powerful, than it. In other 
words, D is admissible if there is no rule D’ for which, simultaneously 


Pr{A | D’, $o} = Pr{A | D, do} = p, 

and for all i, Pr {A | D’,¢;}<Pr{A | D, ¢;}, 
while for some i, Pr {A | D’,d;}< Pr{A | D, ¢;}. 
If D is inadmissible, then a uniformly more powerful rule D’ exists, and the only possible 
reason for using D rather than D’ to decide one’s course of action would be that D involved 
less computation than D’, or something similar. On the other hand, if D is admissible no 
conclusive argument based solely on the frequency of errors can be brought against its use. 
We now proceed to characterize the class of all admissible rules . If this class turns out to 
have only one member, this rule will undoubtedly be the one to use (apart from considerations 
of ease of computation etc.); it will in fact be the uniformly most powerful rule. In general, 
of course, the class of all admissible rules will have more than one member. 

Let D be a fixed admissible final decision rule, and let D’ range over all possible final 


decision rules such that Pr{A | D,,} = Pr{A|D’,d,} =p, 
and consider the set C of points x in k-dimensional vector space such that the ith co-ordinate 
7" x, = Pr{A|D’,¢,}—Pr{A | D, 9}. 
Then C has the property P of our geometrical lemma. For if D’, corresponding to the point 
x in C, has typical function d,}, and D”, corresponding to the point x’ in C, has typical 


function dy, then the point 6x+(1—6)2’, for 0<@<1, corresponds to the decision rule 
6D’ + (1-6) D”, with typical function 6d} +(1—6)d}, and for this decision rule we have 


Pr{A | 6D’ +(1—6) D", d.} = OPr{A | D’, do} + (1-8) Pr{A | D’, do} 
= Op +(1—0)p =p. 


Hence C is convex. It has no points in the negative aliquant because D is admissible. And 


so, from the geometrical lemma, it follows that there is a point w in the positive aliquant for 


which u.x>0 


for all x in C. If we take w,; = u,/Xu,; this means that non-negative weights w, exist, with 
> w; = 1, such that for all D’ of the class considered 
i 


yw,Pr{A |D,¢,}< Dw, Pr{A|D’,g} (@=1,2,...,4), 
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or, dividing by p, we see that D minimizes the weighted odds of error among all rules giving 
the same probability of rightly taking action A. 

We shall call a rule which minimizes the weighted odds of error, for some choice w, of 
weights, a Bayes rule. We have just seen that all admissible rules are Bayes rules. We now 
proceed to justify the term ‘Bayes rule’, and to show that in a sense the converse is true— 
‘almost all’ Bayes rules are admissible. 

First we notice that minimizing the weighted odds of error is, by the statistical lemma, 
equivalent to minimizing the conditional mean value of the weighted likelihood ratio 1,. 
And since we are considering the sampling rule as fixed, the likelihood ratio 1, can be regarded 
as the value of a variate A having a definite distribution, with cumulative function 

Pr{A<A|$q} = F(A). 
We evidently minimize the conditional mean value of A by picking out the smallest values 
first, i.e. by choosing d, (x1, %», ...,z,) so that 
d, = 1 when A<A(p) 
= 0 otherwise, 


where A(p) is such that F(A(p)) = p. If the distribution of A is not continuous, such a A(p) 
may not exist; if so, there will be a A such that 
Pr{A<Alddb<p, 
while Pr{A<A|¢}>p. 
In such a case we choose @ so that 
Pr{A<A|¢}+O@Pr{A =A|¢,} =p, 

and then put d, = 1 when A<A 

= @when A=A 

= 0 otherwise. 
Alternatively, when A = A we may give d, any set of values whose conditional mean is 0, 
instead of putting d, = 0. 

Now since w;, is non-negative and ¥ w; = 1, the weights w, could be thought of as corre- 


sponding to prior probabilities, w; Boling the prior probability of ¢,;, given that one of the 
alternatives ¢; is true. Then if p is the prior probability of ¢), and g = 1—>p, we have, by 
Bayes’s theorem, when the results x,, 2%, ...,z, have been observed, 


Posterior odds against ¢, = aM; P;(2,) ... h(X,)/Pho(Xz) -.. Do (a, 


” ais 
Thus it will be odds on that ¢, is true if c¢,<p/q 
and it will be evens if c, = p/q. 


Thus if we had a person using Bayes’s theorem, with this assessment of the prior probabilities, 
and with the rule that he was prepared to take action A whenever the odds favoured ¢5, and 
to ‘toss for it’ when the odds were even, such a person would use the same rule as that 
obtained by minimizing the weighted odds of error, provided p/q were taken equal to p, and 
the ‘toss’ were such that the probability of ‘success’ was the @ given in the preceding 
paragraph. It would be impossible, on the basis of an objective study of his behaviour in 
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situations such as we are considering, to differentiate between such a person and a person 
who acted on the rule of the preceding paragraph. This is why we call such rules ‘Bayes 
rules’. 

We should perhaps observe at this point that in general the specification of a Bayes rule 
will involve the specification of the quantitative losses expected to be incurred from taking 
wrong decisions, in addition to the prior probabilities. It happens in our case that this is 
unnecessary because we are considering only a two-way decision—to take action A, or not. 
Thus in extending our treatment to cases of multiple decisions we would have to introduce 
this slight extra complication. The general Bayes principle will be to take action A when the 
expected gain from doing so outweighs the expected loss. 

Now in our formulation of the decision problem, whether or not to take action A, we have 
said nothing about how the true ¢ comes to be determined. It is therefore consistent with 
our assumptions (though it does not follow from them) to assume further that the true ¢ is 
determined by a random procedure in which the probability that ¢ = ¢, is gw;, while the 
probability that ¢ = ¢, is p. With this further assumption, it is clear that the Bayes rule 
given above is best possible, subject to the condition that Pr {A | 49} = p. Any other rule 
would mean that we could find two situations such that the posterior probability of ¢ in 
the first situation was less than it was in the second situation, and yet we would take action 
A in the first situation but not in the second; we would clearly then improve our chances of 
acting rightly if we interchanged our actions in the two cases. 

Now this argument shows that every Bayes rule with non-zero weights w; is admissible. 
For if there were a uniformly more powerful rule we could, by using it, in the case where our 
further assumption is true, reduce the frequency with which action A was wrongly taken, 
without decreasing the frequency with which action A was rightly taken; and we have just 
seen this to be impossible. If some of the weights w; are zero, the argument is not conclusive. 
For it might happen that the uniformly more powerful rule reduced the probabilities of 
error only for those ¢; for which the weights w; were zero. We would then not improve on 
the Bayes rule in those cases where the further assumption was true; but we might improve 
on the Bayes rule in other cases. What does follow in this case is that there can be no rule 
which gives smaller risks of error in all those cases ¢; for which the weights w; are non-zero. 
If we say that D is weakly admissible if no D’ exists such that 


Pr{A | D’, do} = Pr{A | D, bo}. 
while Pr{A|D’,¢,}<Pr{A|D,¢ for all i = 1,2,...,k, 


then our argument shows that every Bayes rule is weakly admissible. 

Thus, to sum up, we have shown that every admissible rule is a Bayes rule, and every 
Bayes rule is weakly admissible, while every Bayes rule with non-zero weights is admissible. 
By going deeper we could clearly prove more. For example, under suitable continuity 
restrictions we could show that any Bayes rule with some weights zero could be obtained as 
a limit from a sequence of Bayes rules with non-zero weights, and then such a Bayes rule 
would be admissible, not merely weakly admissible. More simply, we may observe that any 
Bayes rule is admissible for the restricted set of alternatives for which the weights are non- 
zero. So that in any case we may say that, in so far as we adopt the approach of Neyman 
& Pearson, as carried further by Wald and Lindley, we are led to consider the class of all 
Bayes solutions to the decision problem as containing all the solutions worth consideration, 
and as containing no solution wholly objectionable. 
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ADMISSIBLE SAMPLING RULES 


Before concluding we would like to indicate some of the points involved in extending our 
considerations to the question of selecting ‘admissible sampling rules’. We therefore relax 
the restriction so far made that the sampling rule is fixed. We now consider sampling rules 
which are, in some sense, ‘equivalent’ in respect of admissible final decisions. It seems 
reasonable to assume that the weights w; to be used in making the final decision are fixed 
for us, since their choice will be conditioned mainly by the relative importance of the various 
sorts of mistake which might be involved in wrongly taking action A, or in wrongly failing 
to take it. Thus the likelihood ratio A to be used in making our final decision will be a given 
function of the observations. And from the preceding considerations it seems reasonable to 
say that one sampling rule will be equivalent to another in respect of final decisions if the 
associated cumulative distribution function of A is the same for both. This will imply that, 
for every choice of Pr {A | ¢o} = p, the minimum value of the weighted odds of error will be 
the same for all sampling rules considered; conversely, this condition implies the former one. 
We thus consider only sampling schemes for which the cumulative distribution function of 
A is fixed as F(A). These sampling schemes will differ amongst themselves in the numbers of 
observations called for when they are used. We shall want the number of observations to be 
as small as possible, in some sense. 

In general, the number of observations called for by a sampling rule will be a variate 
whose distribution depends on which of the possible hypotheses happens to be the true one; 
but in order to keep things simple we shall consider only one probability function ¢. Then 
if the typical function of the sampling rule 8 is 8, (x1, 2%, ...,z,), and we define 


t, = (1—8,)(1—8,)... (1—8,), 


then the probability that the sampling rule S calls for a sample larger than r is 


Pr{N >r|S} = [Jean ey ky ee” 
We now define an admissible sampling rule to be a sampling rule S which is such that no S’ 
exists, equivalent to S in respect of final decisions, and such that 
Pr{N>r|S8}<Pr{N>r| 8S} for all r = 1,2,...,(n—1) 
while for some r Pr{N >r|8’}<Pr{N >r| S}. 


Now suppose S is admissible, and consider the n-dimensional set C of points wu such that 
the rth co-ordinate of wu is 
u,= Pr{N>r 





S’}—Pr{N >r| 8}, 


where S’ ranges over all sampling rules, admissible or not, which are equivalent to S in 
respect of final decisions. Then C has the property P of our geometrical lemma. It is easy 
to see that no point of C is interior to the negative aliquant. We have to show thai C is 
convex. 
Let us suppose that S’, with typical function s;, and 8S”, with typical function s, both 
satisfy the conditions laid down. We define 
f, = (1—s{)(1—8))...(1—s), p= aiff 


r°r—1i 
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and ¢; and p; are defined similarly. We then define another sampling rule = by the conditions 
O, = (8, +83)/2, = (861 +8764) (H_a HE). 


Since ¢, is a weighted mean of s; and s’ it lies between them, and hence it lies between 0 and 1, 
and so it represents the typical function of a possible sampling rule. We also have 


T, = (l—o,) (1-93)... (l-—o,) = (4 4+0)/Gitta)}ta 
= (+0)/2, 
and hence Tl, = 0,71 = (p,+p;)/2. 


Thus the probability that a final decision will be taken on the basis of the sequence of 
observations (%,, 2, ...,2,) when the sampling rule & is used is half-way between that when 
the sampling rule S’ is used and that when the sampling rule S” is used. It follows that since 
F(A) is the cumulative probability function for A when either S’ or S” is used, F(A) is still 
the cumulative probability function for A when = is used. Thus = is equivalent to S in 
respect of final decisions, and corresponds to a point v in C for which the rth co-ordinate 


vy, = (u, ck u,)/2, 


where uw, and u’, are the rth co-ordinates of the points in C corresponding to S’ and 8”. Thus 
when C contains two points it contains the midpoint of the line joining them, and so is 
convex in a slightly weaker sense than the usual one. It can easily be seen that our proof 
of the geometrical lemma remains valid for this weaker sense of convexity. 

It now follows that if S is an admissible sampling rule it must be such as to minimize 


a weighted sum EO,Pr{N>r| g}, 


with non-negative C,. These C,, can clearly be interpreted as the extra cost involved in taking 
more than r observations given that we have already taken r. If this is done, then S will be 
a rule which minimizes the expected cost of the observations. Such a rule is called a Bayes 
sampling rule, and this terminology can be justified, and a converse proved, as with the 
final decision rules. 

We cannot proceed further, to consider the simultaneous changing of the final decision 
rule and the sampling rule without, as it were, establishing a rate of exchange which decides 
how we are to balance a gain in the reliability of our decision against an increase in the 
number of observations. Wald’s completely pragmatic approach seems to be the only way 
of doing this. 


CONCLUSION—UNIFORMLY MOST POWERFUL RULES AND INFERENCE 

Reverting now to the case where the sampling rule is fixed, it may turn out in particular 
cases that there is only one admissible decision rule. This will be so if the sets of observations 
for which A <A(p) are independent of the choice of weights w;. Now this is the classical 
condition of Neyman & Pearson for the existence of a uniformly most powerful test. And 
as we have already remarked if there is only one admissible rule, it must be uniformly most 
powerful. 

In other cases, where no uniformly most powerful rule exists, the choice among the 
admissible rules must be based on features of the problem not so far taken into account— 
features such as would lead, for example, to at least an approximate specification of a prior 
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probability distribution. In the absence of such further information the decision problem 
has an essentially arbitrary character, which sharply differentiates it from the inference 
problem. An arbitrary inference cannot be valid in any objective sense; thus we are led to 
follow Fisher, in basing our inferences on likelihood, which is uniquely determined by the 
observations. 

There is here an analogy with diagnosis and treatment in medical practice. The inference 
from the data is the diagnosis; the decision adopted is the treatment. The diagnosis does not 
uniquely determine the treatment, in deciding on which the doctor will have regard to other 
features of the case besides the patient’s specific illness. 

A tendency exists in the literature to play down the arbitrariness of the decision problem, 
sometimes by the introduction of arbitrary principles like Bayes’s axiom, the Neyman con- 
ditions of ‘unbiasedness’ and ‘similarity’, or the minimax principle, to name some in 
historical order. It seems to the present writer that this tendency arises from a mistaken 
attempt to identify the decision problem with the inference problem. An inference from 
data should be unique, though it may be uncertain. But to try to give rules for unique 
decisions, in the absence of elements essential to a unique solution is like trying to design 
a bridge without knowing the strength of the materials available—only very crude approxi- 
mations will be possible. 
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MISCELLANEA 


Sequential tests for binomial and exponential populations 
By F. J. ANSCOMBE anp E. 8. PAGE, Statistical Laboratory, Cambridge 


Consider a series of independent trials in each of which a certain event has probability p of occurring. 
If the trials take place at equally spaced time-instants, so that the time elapsing from one trial to the 
next is always 7, and if we consider the limit 7 > 0 and p > 0 with p/r constant, say p/t = A, then in the 
limit the occurrences of the event form a Poisson process with mean rate A per unit time. Let ¢, denote 
the time elapsing from an arbitrary origin to the first following occurrence of the event in the Poisson 
process, and let t,(7>2) denote the time elapsing from the (r—1)th to the rth occurrence. Then each 
t, (r>1) has an exponential distribution with probability element e-A*‘A dt. 

Thus, starting with a binomial distribution of trials, we have reached an exponential distribution of 
times. It is to be expecied, therefore, that there will be a close relation between (i) a Wald sequential 
test, of the parameter p of a binomial population, considered in the limit when the values of p concerned 
are infinitesimally small, and (ii) a Wald sequential test of the parameter A of an exponential population. 
It is the purpose of this note to point out that, given a test of either sort, (i) or (ii), there is a test of the 
other sort having an identical operating characteristic, and that the average sample sizes of these two 
tests are connected by a sim ple relation (given at (4) below). 

For the binomial population, consider the likelihood-ratio sequential test (Wald, 1947) for the two 
hypotheses, H, that p = p, and H, that p = p,, where p,; >p,>0. After any number n of observations 
have been taken, let '», denote the total number of occurrences of the event. Then sampling continues 
while 
<a+nin 1=Po e (1) 
Poll —P,) 1—p, 


1— 1- 
b+nin— <n, in Pil! — Po) 
ag! 


where a(>9) and 6(<0) are constants, and sampling terminates as soon as one of the inequalities is 
violated, H, being accepted if the right-hand inequality is violated, H, if the left-hand inequality is 
violated. We call the chance of accepting Hy, considered as a function of the parameter p, the operating 
characteristic of the test. Note that if H, is accepted, the event occurred at the last observation, while 
if H, is accepted the event did not occur at the last observation. 

Let us now set 9/T = Ag, p,/T = Ay, nT = T, and let 7 > 0. The sampling condition (1) becomes 


b+ (A, —Ay) T <n, In (Ay/Ay) <a+(A,—A,) T. (la) 


Observation of the Poisson process continues as long as (1a) is satisfied, and ceases at the first instant 
when one of the inequalities in (1a) is violated. If the right-hand inequality is violated, so that H, is 
accepted, sampling terminates with an occurrence of the event, while if the left-hand inequality is 
violated and H, is accepted, sampling terminates with an interval in which the event does not occur. 
It is convenient to regard the operating characteristic as a function of A, instead of p. 

Turning now to an exponential population of times ¢ with probability element e~4tAdt, we consider 
the likelihood-ratio sequential test for the two hypotheses, H, that A = A, and H, that A = A,, where 
A,>A,>0. Sampling continues while 


n n 
+(Ay—Ao) Y t<mlm (Ay/Ag) <a’ +(Ay—Ag) Y ¢,, (2) 
r=1 r=1 


where a’ (> 9) and b’ (<0) are constants, and sampling terminates with the acceptance of H, if the right - 
hand inequality is first violated and with the acceptance of H, if the left-hand inequality is first violated. 
We call the chance of accepting H5, as a function of A, the operating characteristic of the test. 

Let us now identify the observations t, from the exponential population with the times between 
consecutive occurrences of the event in the Poisson process. The n in (2) is identified with the n, in (1a). 
The two tests, with respective conditions (1a) and (2), will lead always to the same decision, and so have 
the same operating characteristic, provided that 


a =a, b’=b+In(A,/A,). (3) 
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Then the difference between the tests is merely that if the observations lead to the acceptance of H,, 
sampling rule (1a) requires that sampling ceases at the instant when the left-hand inequality becomes 
an equality, while sampling rule (2) prolongs the time of observation until the next occurrence of the 
event. If the observations lead to the acceptance of H,, both rules operate alike. Note that the extent 
of observation T in rule (1a) corresponds to the cumulative sum of the readings Zt, in rule (2), while the 
cumulative sum of the readings n, in rule (1a) corresponds to the number of observations 7 in rule (2). 

Assuming always that (3) holds, let &,, @,, and &, denote expectations when sampling terminates 
according to rules (1), (1a) and (2), respectively. We have 


@,,(T') = lim 76, (n). 
T—>0 


By a well-known identity of Wald’s (1947, equation (A: 68)), 
E14(%) = A@,(T). 


The 7 at the termination of sampling by rule (2) is equal either to n, or to n,+1 at the termination of 
sampling by rule (1a), according as H, or H, is accepted, so that, if P denotes the probability that H, 


will be accepted, E,(n) = E,,(n,) +P. 
Thus we have the relation 
én) = P+A@,,(T) = P+ lim p@,(n). (4) 
T—>0 


Explicit formulae for P and for &,(n) or &,,(7) were given by Burman (1946). Thus explicit formulae 
are available for the operating characteristic and average sample size of the test of the exponential 
population. It may be noted that, apparently in ignorance of Burman’s work, Dvoretzky, Kiefer 
& Wolfowitz (1953) have obtained the same formulae for the Poisson process. 

The above results concerning the relation between the Poisson process and the exponential population 
can alternatively be deduced directly from a consideration of (i) the differential-difference equations 
satisfied by P and &,,(7') for the Poisson process, these being limiting forms of the equations given by 


Burman, and (ii) the corresponding integral equations satisfied by P and @,(n) for the exponential 
population. 
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Tables of generalized k-statistics 
By 8S. H. ABDEL-ATY, University College, London 


1. Ina recent publication Wishart (1952) set out a technique for obtaining the moment coefficients 
of the multivariate distribution of the sample estimates (the k-statistics) of the population cumulants 
for the case where the population from which the sample is drawn is of finite size N. Use is made of 
a generalized k-statistic k,, having the property that its expectation over the finite population is equal 
to the same generalized K-statistic calculated from the whole population. This is expressed by 


E y(kers,...) <1 Bes... (1-1) 


k,,,,, may be defined in terms of the augmented symmetric functions of the sample observations, and of 
the sample size n (see Wishart, 1952, equation (2-1)), or, inversely, any augmented symmetric function 
may be defined in terms of the k,,,, as in Wishart’s equation (2-2). It follows that provided any sample 
criterion in which we are interested can be expressed in terms of’monomial symmetric functions (in 
particular, this is true for polynomial functions of the Fisher k-statistics), its expectation can be obtained 
immediately by (1-1). Further, the generalized K-statisties in the answer may, if desired, be expressed 
in terms of the elements of the population by using the relation for k,,, (Wishart, 1952, equation (2-1)), 


with K written for k and N for n, and interpreting the augmented symmetric functions as applying to 
the whole population. 
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2. Wishart provided a table giving the numerical coefficients of his reciprocal relations (2-1) and 
(2-2), up to and including functions of weight 6. The present table has been worked out for weight 12. 
To find a generalized k-statistic in terms of the augmented symmetric functions the entry is made down 
the appropriate column as far as and including the diagonal unity printed in bold type. For the inverse 
relationship the appropriate row is entered as far as and including the diagonal unity. Thus, for example, 
we find 

Kegonys = 2[122]/n9® — 7[212°]/nD + 8[2219]/n9 — 3[231°}/n™ 
+ [319}/n@ — 2[3217]/n™ + [32715] /n®, 

The table includes all tables of weight less than 12, in the sense that the required coefficients for a given 

weight can be picked out from it and the remainder discarded. Thus to obtain the table of weight 11 we 


cancel unity once in the subscript of the k’s and in the augmented symmetrics and discard all columns 
and rows where it is not possible to do this. For example, from the expression for k391;s given above we 


obtain kesane = 2[122]/n® — 7[219]/n® + 8[2217]/n™ — 3[2915}/n® 
+ [318]/n — 2[321°)]/n® + [32214] /n™. 
3. The introduction of the generalized k-statistics enables moments of sample criteria to be written 


down succinctly and it restores symmetry to the expressions for the case of the finite population where 
previously little or no pattern was visible. This is particularly so for the sample mean. Thus if we write 


M(1") = @y(k,- Ky)’, 





r—2 r—3 
A, = ar a —...(—- ly 
ey Fae 
pits 48 
we have 
M(1*) = 0, 


M(1*) = K,A,, 

M(1*) = K;A;, 

M(1*) = K,A,+3K,y Aj, 

M(15) = K,A,+10K,.A3A,, 

M(1*) = K,Ag+15Ky,A,A,+ 10Ky:A} + 15K, A}, 

M(1") = K,A,+21K,5,A;A,+ 35Ky,A,Az+ 105K yA, A}, 

M(18) = K,A,+28K,,A,A,+ 56K5,A,A, + 35K As + 210K yA, A? + 280Ky:, AZ A, + 105K Af, 
and so on. The numerical coefficients are the same as those of the appropriate augmented symmetric 
functions. Yet in spite of the elegance of these expressions it is open to question whether the procedure 
involved in applying them tc actual data is any improvement over that suggested by Irwin & Kendall 
(1944). In order to obtain M(15) in terms of the moments of the population it is necessary to express the 
generalized K-statistics in terms of Fisher’s K’s. Wishart gives the necessary relations, but the solution 


of a determinant of the sixth order is required, the same determinant incidentally as that to which we 
are led by Irwin & Kendall. This would appear always to be the case. 


I should like to express my thanks to Dr F. N. David and Professor M. G. Kendall for the 
encouragement they have given me in this work. 
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ne) = 35 105 35 pict 252 630 420 245 630 315 
I Z . . 
Zin aso] 5] sis] i) a8 | wel OG) | “Bl a] 
Bain : 3 35 . - | 3 aac 392 168 56 ts 56 
2 , ° | 2 33 . 
Bahn . 35 ond * 35 | ' 56] 336] 50 sou | a8o}: see] "268 
831]/n@ ° 35 fan 315 420 | 35 56 504 117 
84]/n™ 280 105 3 ’ a aia a , 
>t 3 3 8 04 5°. ‘ 
gr*yin(®) -1 a 315 | a6 | 8a | 1134) 378] gos) 30 378 
satiny a6o| 313| 945 315 i al oe . | 2520] 2520 
a ” 1575 1575 ‘ a os 4978 6300 3780 2520 5040 2520 
10, ym aroo | 1575| 3150 | 1575 | 462 | 6930 | 20790 | 6930 | 9240 | 27720 . 
11, tin - | $775 | 17325 : | genes 5 | 4o2 | 16632 | 83160 | 83160 | 27720 | 166320 | 89x60 
12]/n 138600 | 17325 | 103950 | 51975 | 69300 5775 
17 
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Weight 12(iv)| sas Ray? Roan Ress yea Rots Rest Rent Reats? Res? Rea? Rosas Res? 
139) /n@*) 96 -I 144 — 288 76 — 576 —120 120 —120 120 —240 240 — 480 
21) /n0 — 528 648 —792 1728 | —2880 3456 360 —480 600 —720 1080 | —1320 80 
2*1*}/n0 1056 —972 1620 | —3888 5040 | —7920 —270 30 | —IIIO 1710 | —1620 2700 | —6480 
2°1*}/n) —9g0o 540 | —1512 3906 — 3600 640 30 —300 930 | —2040 870 | —2490 
2‘14}/n® 270 —9go 630 | —1800 goo | —4500 ‘ 30 —330 12 —9go 960 | —2790 
2'17)/n\™ ‘ ‘ —90 270 goo a : 30 — 360 ° —9go 270 
2°\/n™ i ‘ i" 4 ‘ 4 ‘ > 3° _ : é 
31°}/n) 176 —216 216 —576 960 —960 —120 120 —120 120 — 360 360 — 960 

21"}/n) — 664 540 —756 2376 | —2880 3840 120 —240 360 — 480 960 | —1320 4080 
321°) /n( 780 — 300 840 | —3192 2400 | —5280 120 — 360 720 — 630 1590 | —5760 
32°17) /n\ —270 30 — 330 1500 — 600 3000 = ‘ 120 —480 30 — 660 2820 
32*1)/n a 3 ° —180 ; — 600 a PF ‘ 120 ‘ 30 —180 

37 9}/n( 104 —80 te) — 376 400 — 400 —10 10 —10 10 — 140 140 y 

far'}jn — 220 4° ~-120 820 — 400 fo ‘ —10 20 —30 150 — 290 16 
3°2*1*)/n(® 90 yi 4° —420 100 — 500 -10 30 ‘ 150 | —1080 
32°} /n“) ‘ : ‘ ° ° 100 ‘ ; —10 : ‘ ° 
371°) /n™ 20 ‘ = ~ s ; é é ‘ —10 10 — 160 
3°21) /n\) -10 " x 40 a a ‘ ‘ P ‘ —10 180 

Vn " j ‘ ‘ ’ 7 ‘ P ‘ a ° é —10 
41°)/n™ —20 54 —54 108 — 240 240 30 —30 30 —30 60 —60 120 
421°} /n® 60 —120 174 — 402 600 — 840 —15 45 —-75 105 —120 180 — 420 
42318) /n) —45 45 | —165 45° | —300 goo : —15 60 | —135 45 | —165 450 
42°1*)/n\® ‘ ; 45 | —135 ° — 300 : —15 75 : 45 | —135 
42‘\n® ‘ A 7 ‘ - 4 f 4 —I5 ‘ ‘ : 

31°} /n —20 40 —40 134 — 200 200 > 30 —30 120 
4321°)/n™ 30 —10 50 — 260 100 — 300 -15 45 — 240 
432'1)/n) ; i —10 75 - 100 -I5 90 
43°1*)/n -5 : ‘ 4° 4 " ; 30 
i May . ° ° —10 e ° —15 
4°14\/n™ —5 5 —10 25 —25 J 

| 

$21")/n\) 4 : - i / 4 . . : ° j 
ft heo vB he .5 te . ; 

231) /n\ ‘ : ° - . . . 4 4 7. ” ° ° 
fino ; . : ; . : 

17)/n® 4 -6 6 —12 48 —48 -6 6 -6 6 -12 12 —24 
§21°}/n —12 12 —18 42 —120 168 re -6 12 -18 18 —30 72 
$2*1*}/n® 9 -3 15 —42 60 —180 F -6 18 ° 18 —54 
ae a ‘ x -3 9 3 60 q -6 . : ‘ 
$31*}/n -4 4 —14 40 —40 . “ ~ 
§321°)/n\) -é ; —4 24 —20 “) : -6 3 
§32*)/n d ‘ e -3 > —20 ‘ 4 d - e . 
53°1)|/n\ I . > —4 > « ° d 4 . ‘ -6 
541°)}/n Z -!I 2 —10 10 P d : “ @ : 
$421 ln 4 I I -3 : —10 ei d ‘ 3 i ‘ 

43)/n 4 I 3 Z 4 6 F d d ‘a ‘ 
543) é 10 4 Z -I - q pe 5 ‘ r 
ec pat 10 10 3 I I « q a . i " @ 

19/n™ ‘ . ° ; Z —1 I -1 2 -2 4 
G218}/n\) ° I r -2 3 =3 5 —12 
62'1*)/n\) + I 2 I -3 a -3 9 

| | 
62°) /n\) : . w J ; ; I 3 3 Z ° ‘ F 
631 shah é : J j 4 I 3 a ° Z -1 
6321]/n™ d . ; =a : I 4 3 fe I I 3 
63°) /n® 6 . > ns A I 6 9 . 2 6 I 
eat in ‘ 6 > ‘ ° I 6 3 ° 4 ° . 
et] ni) F 6 6 e | 4 I 7 9 3 4 4 
651) /n 10 45 15 ° 6 I 10 15 ° 10 10 ° 
én 120 180 180 » | 36 2 30 90 30 40 120 20 

1*}/n© s . ° ° | é 7 q ° . . ‘ 

721*}/n “] . 7 7 

| | 
92*1)/n\™ - | J ie >I 7 14 7 ' é 
731*)/n\™ j - | - | - | 7 21 » 7 ‘i 
9732)/n\*) , ’ Ka = 7 28 21 Z 7 
741 in”) a 21 21 " | - | ° 3 42 21 2 4 

S)/n 70 140 210 s | 21 | 21 7° 105 7° 7O 
mo ¢ Ly 4 | - ’ 28 28 z 4 ¥ 
821*)/n™ ‘ 7 » A 28 6 28 . £ a 
82")/n“ j ¥ 28 4 84 28 3 , 

831} /n) 56 | . ‘ . | 28 112 84 . 28 28 
84)/n 224 | 56 168 | 56 | 28 196 252 84 112 112 
91°) /n\ 126 . | 7 84 252 “ 84 ' 
921}/n") 4 126 126 | 1 4 84 336 252 8. 84 A 
93)/n") 504 126 378 126 ; | 84 504 756 16 504 84 
10, 1*}/n“® ‘ 1260 ° — 126 7 210 1260 630 > 840 ; Z 
10, 2)/n‘*) ° 1260 1260 126 126 210 1470 1890 630 840 840 
13, 1}/n*®) 4620 6930 6930 | oT 1386 . 462 4620 6930 ° — 4620 . 
12)/n 55440 27720 83160 | 27720 | 8316 8316 924 13860 41580 13860 18480 55440 9240 
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{ 
Weight 12(0) | Raat | fam | en |e eed, Rw hs hee hi bee bition) clei daria, | (on lan 
het 
| | 
17}/n@) 720 —720 | —2880 14400 | 720 —720 720 1440 | — — 4320 17280 | —5 5040 
—480 ! are ee ue 4320 15840 —86400 | —2520 3240 “3 — 7200 8é40 23 — 103680 201 — 25200 
80 ait int yoo | — —31680 | 1 2520 | —5040 280 12600 | —19800 | —47520 233280 |—25200 | 45360 
80 2°1°\/n : —4500 10800 27720 | —201 —630 3150 | —81 —8820 21420 41580 | —241920 10080 | —35280 
2‘1*}/n® 1170 | —5670 | —9g900 500 , — 630 37 1890 |—10710 | — 15120 Ir — 630 10710 
—2790 ah i” —go 1260 goo | —16200 : ; —630 é 1890 1890 | —18900 ‘i 630 
2*\/n . —90 ‘ : $ ‘ ‘ ‘ : 7 ° . 
a 31°} /n0 1200 | —1200 | —5280 a8So0 8. —840 840 2400 | —2400 | --7920 34560 | —6720 6720 
—960 321 I/n( — 3600 00 18480 |—115200 | —12! 2100 | —2940 | —7560 9960 bi oo - "se 13440 | —20160 
4080 32°1°)/n() 2880 | —6480 |—19800 | 151200 210 | —1470 3570 6720 | — 14280 | — 181440 | —5040 18480 
—5760 at tN — 480 36e 6900 | —72000 | 210 | —1680 | —1260 7980 > ~Siaer - | —s040 
‘ 32‘1)/n . —480 — 300 7200 . ‘ 210 . | —12 —630 12 N ; 
=e ) cee $40 “34° a | 6800 140 oe : =e —1120 | — oo 20160 ie — 
a z*21‘]/n - — 36000 é - = 2800 — 42000 _ 
16 372*17}/n 30 —630 | —1500 | 19800 140 210 | —1890 | —1260 21000 ° 5 
—1080 pani 30 | — 600 210 —2100 . 
° at pad 40 —40 —200 2400 140 —140 — 560 2800 ° 
ea pit . 40 100 | ——— 140 — 1400 . : 
3'\/n > . - | : ° ° . . ° . . ° 
= 41°}/n® — 300 300 1320 — 7200 —210 210 —210 — 420 420 1980 — 8640 1680 | —1680 
— 421°} /n'®) 810 | —1110 | —3960 25200 210 — 420 630 1050 =—2 — 6300 30240 | —2520 
- 420 42*1*}/n( —540 1350 3150 | —27 é 210 —630 — 630 1680 5670 | —31500 420 | —2940 
450 : peep 75 “To —600 | 9900 j ; 210 4 —630 | —1260 9450 ‘ 420 
% 42‘}/n 3 : ~ g 3 3 4 a : . . ; 
ie 431°\/n©  - 240 1200 —7200| —35 35 -35 | —280 280 1890 | —9240 s60 | — 560 
120 atk hat 180 —— — 1200 osgee | > —-35 ca 315 ae pe. 4 14700 . 560 
= 432*1]/n a 150 -3 ‘ - : —4200 ‘ ; 
pe ae la -10 10 50 — 600 é 6 ‘ —35 35 280 i ‘i e 
43°2)/n 3 —10 : 300 7 ; ‘ E -3 “ ° 2 4 
~ig on” jo —30 —150 goo ‘ » d . —210 1050 -35 35 
. ane —15 45 75 | — 900 3 ‘ F ~ ‘ 210 — 1050 ‘ -35 
4 : -1I of 225 . ° d . . ‘ @ 
ee . ° eT 4 7 : ° ° . —35 175 e e 
} 517}/n® 36 —36 — 264 | 1440 42 - 42 8. —84 —252 1728 — 336 336 
a4 521° [n) -72 108 720 —4320 —21 3 —105 — 16 252 630 — 5544 336 -672 
72| ) 521 Jin 18 “¥ —450 3240 | —21 84 63 he —— 5040 ‘ 336 
oe s2z|ini ° i. 2 30 | -3 é ° —21 Fa 3 a “i pie . 
z 531*}/n 24 24 1440 . : e 42 2 1 1680 5 5 
= 5321*)/n(*) y 24 180 — 1440 - | 3 § —21 3 84 —2100 . —56 
3 532°} /n'*) j 3 oil . } —21 420 . 
$3°1}/n‘*) é —r1o0 | 120 } 140 
-6 541°} /n'*) —6 6 60 | — 360 42 —420 é 
* satin ; -6 —15 | 180 -21 318 < 
543]/n . : , 3 - . 
sitt]/n(® -6 36 | | 2 
5*2]'n ‘ ‘ ’ 3 e —21 . . 
Sayan -6 6 24 —240 -7 7 -7 —14 | 14 2 — 168 56 —56 
4 ) 621‘}/n( 12 —18 —60 | 720 -7 14 21 -35 -% 420 —28 & 
—12 62°1*}/n() -3 15 30 | — 540 ° -7 . 21 21 —210 ‘ - 
9 62*}/n™ ° -3 a! 60 P : . ‘ ° ° ° 
631°) /n() —4 4 20 | — 240 ° ° ° -7 7 28 —140 
: 6321] /n™ "7 -—4 —10 | 240 - | . ¢ r -7 ‘ 7° 
ua 63*)/n® : > ; —20 . | ‘ 
z aot a Z —1I —5 | 60 } —7 35 
2)!" I Z ou - A . “-_ ° ° ° . ° 
6eriin(*) z 5 1 | ~is al : a ‘ ‘ : -7 ‘ ij 
6*}/n() 30 30 12 | I ‘ 3 ; 
20 r)jn® | ol I | - I 2 —2 -6 24 -8 8 
j 7215}/n'*) | | ; I Zz —2 -3 a 12 —60 -8 
; 
972*1)/n™ ~~ - | 1 | 2 I ‘ -3 -3 30 ° P 
7311 }/n) if. - | I | 3 I -1 —4 20 ‘. 
rain 1 bi Sb Sl ORL cil: a] cael | See 
fimo | | ; ae ey ae a | 
8214}/n(*) | | | 8 8 | ot < | 1 I 
82°} /n*) | 8 16 8 | a I 2 
831] /n(*) | ‘ get 8 24 * | 8 1 3 
| 84)/n*) } 28 | 28 | 8 48 24 32 8 +I 
91"}/n() i 36 36 a dl 9 
g21)/n(*) : 3 72 36 . 9 9 
93)/n) | e 36 144 108 | 36 36 . 9 27 
84 10, 17}/n(*) 210 | ° 120 360 J 120 ° 45 45 
10, 2}/n(*) | 210 | 210 : | 120 480 360 120 120 . 45 | go 
; 11, 1)/n® | 2310 | = | 462 ot 330 1980 990 1320 . 330 ‘ 165 495 
: 12)}/n | 13860 13860 5544 462 | 792 7920 11880 | 7920 7920 3960 792 495 2970 
9240 be 
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‘ , | 

Weight 12 (vi) Regs? Rear Rew Rod Ror Ros Rio, 1% Rios 2 Ruy Ris 
i 

1**)/n 0) — 5040 — 10080 10240 0320 — 40320 80640 — 362880 362880 3628800 —39916800} ? 
21™)/n (1) 30240 55440 oumnass age = 221760 | —483840 | 1814400 | —2177280 ~ saneeeoe 239500800 
2*19)/n@ —7os — 110880 408240 2721 —453600 | 1088640 | —3175200 4989600 39916800 | —538876800 
2°1°}/n 0640 95760 | —423360 | —151200 423360 |—1118880 | 2268000 | —5443200 | — 34927200 558835200 
2*1*}/n — 45990 —31500 200340 22680 | —173880 498960 | —567000 2835000 12474000 | -—261954000 
2°1*}/n™ 11340 1890 — 37800 ; 22680 — 68040 22680 — 589680 — 1247400 4490! 

2\/n™ —630 . 1890 . ; 4 ‘ 22680 ‘. — 1247. 
31°)/n) —6720 — 18480 60480 60480 — 60480 161280 — 604800 604800 6652800 — 79833600 
321'}/n) 26880 67200 | —241920 | —151200 211680 | —665280 1814400 | —2419200 | —23284800 319334400 
32*1°)/n(*) — 38640 —75600 312480 90720 | —241920 907200 | —1512000 3326400 24948000 | —419126400 
32°1°)/n() 23520 25200 | —141120 —7560 98280 | —438480 302400 | —1814400 — 8316000 199584000 
32*t]/n(® — 5040 —630 17640 3 —7560 45360 ‘ 302400 415800 — 2494 000 ) 
3°1°}/n — 1680 — 10080 sighs 20160 — 20160 100800 — 252000 252000 3326400 — 46569600 
3321*}/n™ 3920 19600 — 77280 —15§120 35280 | —241920 302400 — 554400 — 5544000 99792000 
3°2*1*)/n(® — 2800 —6720 31920 - —15120 136080 — 37800 340200 1663200 — 49896000 
3°2°)/n 560 ‘ — 1680 ; —7560 . — 37800 : 332 
371°)}/n _ | 1680 6720 560 — 560 21280 — 16800 16800 369600 — 7392000 
3°21]/n oI 560 — 2240 + 560 — 16800 J — 16800 —92400 4435200 
*\/n , : 4 " é 5 5 . a — 92400 
41°}/n 1680 | 3360 —15120 — 15120 15120 — 30240 | 151200 — 151200 — 1663200 19958400 
421°)/n\) — 5880 — 10080 55440 30240 — 45360 105840 | —378000 529200 4989600 — 69854400 
42*1*}/n 71 8400 — 63000 — 11340 41580 | —113400 | 226800 — 604800 — 4158000 74844000 ‘ 
42°17)/n —33 — 1260 22680 : — 11340 34020 | — 18900 245700 831600 — 24948000 } 
42‘}/n\) 420 " — 1890 . ; 4 é — 18900 5 1247 
431°}/n™ 560 2800 — 16800 —7560 7560 — 30240 | 100800 | — 100800 — 1386000 19958400 
4321°)/n\® — 1120 — 4200 30240 2520 — 10080 | 57960 | —75600 | 176400 1663200 — 33264000 
43271) /n) 560 420 — 8400 ; 2520 — 18900 | ‘ — 75600 — 207900 9979200 
43°1*]/n® ° 560 —3920 | | —7560 | 4200 — 4200 — 138600 3326. 
43°2]/n%) , : 560 | = 2520 4200 : —831 
4'1*}/n™ —35 —70 1890 | 630 —630 1260 —9450 9450 138600 — 2079000 | 
42217) /n( 70 | 105 — 2940 | 630 | —1890 | 3150 — 12600 — 103950 2494800 
4°2")/n™ —35 a 525 | s | > | 5 3150 ‘ — 311850 
4°31) /n™ a —35 700 | | 2 630 I1g§50 — 415800 5 
42)/n@) : : —35 | . : ¢ 3 | x : 11550 ; 
51"}/n*) —336 —672 2016 | 302. — 3024 6048 | — 30240 | 30240 332640 — 3991680 
521°}/n™ 1008 1680 — 6048 —453 7560 — 18144 60480 | — 90720 —831600 11975040 
$2°1°)}/n( — 1008 — 1008 5040 756 | —§292 | 15120 | —22680 | 83160 498960 — 9979200 
52°1]/n‘*) 336 J — 1008 <7 756 | — 2268 Al — 22680 — 41580 1995840 
531‘}/n® —56 —448 1680 1008 — 1008 | 5040 | — 15120 | 15120 221760 — 3326400 
$3217) /n" 112 504 —2016 . | 1008 | — 7560 5040 — 20160 — 166320 3991680 } 

| | 

$32*)/n™ —56 5 168 ‘ | ‘ 756 | 5040 ; — 498960 
53°1)/n* 3 —56 22. as y 1008 ‘ ‘ 9240 — 332640 
541°]/n‘ : —33 — 126 | 126 | —252 | 2520 —2520 —41580 665280 
5421) /n\*) } 336 “” —126 378 | ‘ 2520 13860 — 498960 

43] n() —56 aa —126 - | 7 55440 

1*)/n™ | j a - | —126 | 126 2772 — 49896 
Fa haa = a : <4 ; . oil — 126 ‘ = 
1*}/n™ 56 112 — 336 — 504 | 504 — 1008 | 5040 — 5040 — 55440 665280 ! 
621*}/n —140 | —224 840 504 — 1008 2520 —7560 | 12600 110880 — 1663200 
62*1*)/n\®) 112 | 84 — 504 ‘ | 504 —1§12 1260 —8820 — 41580 997920 

| | 

62°) /n\*) —28 | 84 a | i 1260 ; —83160 
631°) /n™ > 56 — 224 —84 84 —672 1680 | — 1680 — 27720 443520 
6321]/n™ | —28 112 : —84 756 | ou 1680 | 9240 — 332640 
63°)/n™ ’ ‘ ef | —84 | , . ‘ 18480 
6417)/n™ a i 56 : | ; —210 210 4620 —83160 
642)/n“) . | ; —28 ef : . | —2I0 | ° 27720 
S5r]/n® ‘ ‘ 2 ; off a] — 462 11088 
68) /n(®) . | tf : | | ; — 462 
71°) /n® | —8 | —16 48 | 72 —72 | 144 —720 | 720 7920 — 95040 
721°) /n 16 24 —96 | —36 | 108 — 288 720 — 1440 — 11880 190080 
972"1)/n™ —8 | . 2. af —36 108 | ol 720 1980 — 71280 
7317) /n™ | | =§ | 32 | P : 72 —120 | 120 2640 — 47520 
732) /n | | ; | ot 4 : —36 | . —120 : 15840 
741} /n*) | | —8 | ei : E ; A —330 7920 
rallies | | 5 at | . et a " ° —792 

1)/n® 1 | 2 | -6 —9 | 9 —18 | go | —90 —990 11880 
8217)/n™ | —2 —3 | 12 2] -9 27 | —45 | 135 99° — 17820 
82") /n') zr} | —3 | | > | | : —45 : 2970 
831) /n™ I | —4 } —9 | é —165 3960 
84)/n 3 4 | x ; —495 , 
g1*)}/n at r | -1 2 | —10 10 110 — 1320 
921] /n } bial } I I —3 | a —10 —55 1320 
93)/n" 9 | r | 3 | I | 5 ; —220 

10, 1*)/n") . | 10 | | ai r a —11 132 
10, 2]/n™ 45 | = 10 Io | I I —66 

11, 1}/n™ of 165 | ‘ 55 55 | | II 3 I —12 
12}/n | 1485 | 1980 495 220 660 | 220 | 66 66 12 Z 


























326400 
991680 
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An angular transformation for the serial correlation coefficient 


By G. M. JENKINS 
University College, London 


1. INTRODUCTION AND SUMMARY 


Quenouille (1948) has investigated the effect of applying Fisher’s z-transformation to the distribution 
of the serial correlation coefficient of lag 1 in a circularly correlated universe. Apart from the fact that 
when p (the correlation in the parent population) is zero, the distributions of the serial correlation 
coefficient and the ordinary correlation coefficient have the same form, there does not seem to be any 
a priori reason for adopting this procedure. In fact, when p+0 the distributions are quite distinct. 

He deduces that ‘compared with the transformation of the ordinary correlation coefficient, this 
transformation has the disadvantage that the variance of z = tanh~'r is to the first order dependent on 
the value of p. Furthermore, for | p| large, the mean value of z deviates widely from ¢ = tanh-" p.’ 

It is emphasized that the distribution is the appropriate one in the linear Markoff case only, viz. when 


t= px%_ +e, and e€,isa N(0,c) variate. (1) 


Hence a transformation and a test of significance based thereon can only be applied once it is established 
that (1) is the correct mathematical model for the situation. 

An inverse sine transformation is proposed* which is designed to stabilize the variance, and it is found 
that the limiting distribution of the transformed variate is approximately normal. The mean value is 
still sensitive to fluctuations in p as the latter tends to unity, but this is not so great as in the tanh=r 
transformation. This raises no serious practical problems, and it is shown that a simple test may be 
applied for values of p as large as 0-9. 

Approximate confidence intervals are suggested which are in close agreement with those obtained by 
Quenouille (1949) using a different argument. 


2. DERIVATION OF THE TRANSFORMATION 


Leipnik (1947) derives the following distribution for the serial correlation coefficient of lag 1 calculated 
from a sample of n: 





T'(jn+1) 
h = ———____ (] —72)k*-0(] 2 Qor)-in, 2 
OTe er (2) 
where o we et = FM 


xit+ap+  +a8 
and the 2’s are independently and normally distributed about zero. 
The mean and variance of r are given by 


os attains 
&(r) =a=——>) (3) 
ae ax sal '- ess (4) 
n+2 (n+ 2) (n+ 4) 


1 
oS oe pbs 2 
Wot Aa?), 


where A= e+ 367") =] 
n(n + 4) 


Hence OF (1—a?). (5) 


~n+2 





* Since writing this paper, the author has learned that this transformation was suggested by 
F. Chartier (Publications de L’ Institut de Statistique de L’ Université de Paris, Vol. 1—Fascicule IV— 
1952) and also by Prof. M. G. Kendall in the course of a discussion on Prof. Hotelling’s paper read 
to the Research Section of the Royal Statistical Society in May 1953. 
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We seek a transformation y = f(r) such that the variance of y is approximately independent of p: 


dr 


and if the variance of y is to be a function of n and not of p we have 


2 
vary+varr (2) ’ 


r=o 


da 
_ _- = inl 
f(a) = kf = ksin-a, 
whence r= siny/k. 
Since the constant has no effect on the form of distribution of y, we may take 


r=siny. 


3. THE LIMITING DISTRIBUTION OF THE TRANSFORMED VARIATE 
Let r=siny, p=sindA, y—A=z. 
For large n it may be shown that r—p = O(n-*). Therefore, since sin? +0 when @ is small, 
y—-A=2 = O(n"). 
We apply the transformation (7) to h(r) as given in (2) and obtain 








3n tobe “ 
neat) sa e[2 ame | 
~ T(4n+4) T(4) (1432 tomes 20 =Pifeine| 
1—p? 


We need the following expansions: 


T(4n+1) hie n 1 l 
Arapranrn) = Me tan *? (zs) 


log, {cos 2} = — $a*— yx +0 (=) 
n 
log, {1—@ tan x} = — O(a + 42° + Fga5) — 40%(x? + B24) 


— $63 (x* + 75) — 3 64x4* — 30525 +O (=) ; 


In a similar manner we may expand 





log, {14 2p* — 2p? cos x — 2p(1 — p?)t sin *} 
1-p? 


neglecting terms of higher order than n-*. 
Taking logarithms of both sides of (9) we obtain, aiter some manipulation, 
pP 3 
(a-pyi™ 





1 
log, {h(x)} = tlog, os — fn; 


1 1 (2+ 13p?) <7 p a 1 
from which it follows that 


s 1 (2 2 
h(x) = Jz e-inz b- x3 T_ 1 +19") 
an 2(1 —p?)s 4n 24 1p? 





. fi Bee) 3 oe 
81—pt”” 8 (1—pt "” 8 (1—pai” 
1 p(2+13p?) ,, 1 PP nix? 





48 (1-p)t "” ~48(i—ps)i 


+o(n)} 


(9) 


(10a) 


(106) 


(10c) 


(11) 


0a) 


0b) 


0c) 


(11) 
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4, MOMENTS AND COEFFICIENTS OF SKEWNESS AND KURTOSIS OF h(x) 


From equation (11) we obtain the moments, as follows: 











Maint a gs tO) rot 
not fate oo() : 
salts pias (ss): ci 
Ma apt? (a) cs) 
ee ER ot a} + (aa) 
= 3= == +0(5)- (17) 


These results may be compared with those of Quenouille (1948, p. 262) for the tanh~! transformation. 
The most noticeable difference is that in his case the second moment is not independent of p to the first 
order. Further, his series for the mean commences with a term containing the factor (1—*)-? instead 
of (1 — p?)-* as in equation (12), so that the mean value ofz = tanh-'r will diverge more from ¢ = tanh-!p 
when | p| is large than will y from A. 


5. TESTS OF SIGNIFICANCE 


We assume that the theoretical model for the situation may be represented by the linear Markoff process 
Ly = PXz_1+€, where €,is a N(0,c) variable. It is desired to test the hypothesis that the population 
correlation equals some value p. 

The transformation r = sin y, p = sin A is such that we may take y—A to be approximately normally 
distributed with mean and variance given by (12) and (13). The variance of y is independent of p to the 


first order. 
f 
If we let } = Ae, re) m40(5), (18) 
n n 
1 dp) 1 
matt o(2) (19) 


the stability of the mean and variance of y is illustrated by the following table: 











p Bip) y(p) 6(p) 
0-0 0-00 0-00 — 1-00 
0-2 —0-31 0-43 —0-94 
0-5 — 0-87 1-59 — 0-50 
0-8 — 2-00 7-27 + 1-66 
0-9 —3-10 20-87 + 5°39 

















Suppose that we take n = 35. It may be seen that even for p = 0-5 the disturbances are negligible 
and tend to annul one another since they are opposite in sign. 

For values of p up to 0-9 the corrections are still small (being majorized by the first-order term), but 
they would have to be allowed for in a test of significance. There is, however, a large increase in the value 
of y(p) for values of p between 0-9 and 1-0, so that it is highly unlikely that the transformation can be 
used at all in this range. 
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This raises no serious practical problems, since in the case of a single series it will not often be required 
to test whether the population value of p is equal to some number exceeding 0-9. It is, however, possible 
that the equality of values.of p in two series, each with high serial correlation, may be the subject of a 
test. It appears that the transformation would not be suitable in such a case. 

The variance as given by (13) is very stable, e.g. it alters from 1/n—1/n? to 1/n—1/(2n®) in the range 
0-0-0-5. These are approximately given by 1/(n+ 1) and 1/(n + 2) respectively. We may thus take y—A 
to be approximately normally distributed with zero mean and variance 1/(n + 1) (provided that p is sms] 
and n is reasonably large). 

If it is known that p lies in the range 0-0 to 0-5, then an average correction may be applied to the 
variance in the form 1/(n+ 1-5). 


6. APPROXIMATE CONFIDENCE INTERVALS 


Quenouille (1949) has obtained confidence intervals for p by considering a transformation of the dis- 
tribution function of r into a form which can be expanded in terms of Student’s ‘t’. This procedure was 
necessary since it would be difficult to construct confidence intervals for p using the r = tanhz trans- 
formation since the variance of z is dependent on p. These intervals are exact in the sense that they are 
derived without approximation from (2), but approximate in that (2) is only a smoothed form of the 
exact distribution of r. They are, however, asymmetrical with respect to probabilities, the degree of 
asymmetry being a function of p. 

The inverse sine transformation results in confidence intervals which, although they are not exact 
since y is only approximately normally distributed, are symmetrical and are smaller on the average. 
When n is reasonably large, they are in close agreement with those obtained by Quenouille, viz. 

PT a ao (20) 
=f “2. 6 
P pote be Mn+) 
Using the fact that y—A is approximately a N(0, (n+ 1)-!) variate, we have that 
P{-—Z, <,/(n+ 1) (sin r—sin— p) <Z,} = 1—2a, 


which results in approximate confidence intervals for p in the form 





Ze ; Ze 
p = rcos——=—. + (1—7r*)tsin — ~. (21) 
j(n+1) J(n+1) 
Using a result given by Peiser (1945), viz. that 
(2? +Z,) 1 
tata = Zt ~ + Ol}, 22 
saa 2u+ 2242 40(2) da 
and assuming that n is large enough to take 
Z Z 
n—-—— #——_<—.,  cos——_*— =], 
(n+1)° J(n+1) (n+ 1) 
Z,(1—r?)t 
(20) and (21) reduce to p=rt- * bana. 


v(n+1) ° 


This result lends support to the assumption of normality of y. 
If we define L, and L, to be interval lengths in (20) and (21), we obtain 


1 —r2)i sin Z 
Ly = Unig Ly = 1-12, 
“a(n +1) J(n+1) 
Let &{(1—r*)} = d(p,n), 
sin Z Z t 
qT! &(L,) = $(p,n) ——* < $(p,n) ——*— < 6(p, n) —2— = &(L,). 
hen (L.) = d(p "n+ Pp.) To <9 "in +1) (Z,) 


_ 


7. EXPERIMENTAL RESULTS 


Kendall (1949) has constructed artificial linear autoregressive series of the form x, = px,_,+€, where 
€, is a N(0,1) variate. His Series 9 (for which p = 0-7) was divided into ten subseries, each containing 
fifty members. Circular serial correlation coefficients of lag 1 were computed for the whole series and for 
each of the ten subseries, the values of x being taken about the theoretical mean zero. 
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ined (a) 95% confidence intervals for p were obtained from the eleven values of r, using Quenouille’s 
ible method (I) and that based on the sin-! transformation (II). 
ofa (6) Estimates of the mean value and variance of y = sin-1r were obtained from the ten values of 


y from the subseries and compared with the theoretical values calculated from (12) and (13). 























inge The results derived for (a) were as follows: 
1—X 
ns] i 
95 % confidence limits 
the Sub- r y 
series 
Method I Method II 
tie. 1 0-544 | 0:5752 0-308—-0-780 0-296-0-751 
a 2 0-644 0:6997 0-430-0-858 0-412-0-827 
a 3 0-790 0-9110 0-618—0-962 0-595-0-927 
: 4 0-651 | 0-7089 0:438-0-864 0-421-0-832 
+ 5 0-683 0-7519 | 0-481—-0-885 0-459-0-855 
ne 
e of 6 0-618 06662 0-397-0 839 0-382-0-808 
7 0-769 0-8773 0-589-0-949 0:567-0-914 
act 8 0-618 0-6662 0:397-0-839 0-382-0-808 
we. 9 0°578 0-6163 0-349-0-807 0-335--0-778 
10 0-784 0-9011 0-610-0-958 0-587-0-923 
9 } | | 
(20) Whole 0-697 | 07712 |  0-634-0-760 |  0-632-0-757 
series } 














It will be seen that for the whole series, Quenouille’s method resulted in the confidence limits (0-634, 
0-760) for p, whilst the sin-! transformation yielded limits (0-632, 0-757). The agreement is close, as 
might be expected for a series containing 500 members. 
In the case of those based on the subseries there is a distinct displacement of the confidence interval 
(21) as a result of using the sin-! transformation. This shift is very nearly the same for all ten intervals, being 
on the average, 0-032 for the upper limit and 0-018 for the lower. This was to be expected since the intervals 
obtained by method I were asymmetrical with respect to probabilities—the translation results in biased 
22) intervals which are symmetrical. It is also noted that each interval length in method IT is less than the 
" corresponding interval obtained by method I, the average interval lengths being 0-400 and 0-412 
respectively. 
The results for comparison (b) were as follows: 





Expected value Calculated value 





Meen 0-743 0-737 
Variance 0-0203 0-0144 

















i The expected and calculated values of the mean agree very closely, whilst a x?-test of significance 
showed that the calculated value of the variance was in reasonable accord with the expected value as 
5 obtained from equation (13). 


In conclusion, I wish to express my warm thanks to Dr N. L. Johnson for his aid during the preparation 
of this work. 
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A distribution arising in the study of infectious diseases 


By J. 0. IRWIN 
Statistical Research Unit of the Medical Research Council, London School of Hygiene 


The late Prof. Greenwood was for many years interested in the infectiousness of measles. During his 
study of the distribution of numbers of cases of measles in households he developed the chain-biromial 
distribution (Greenwood, 1931). This appeared to give a good fit to the distribution of the number of 
secondary cases. However, further analysis by E. B. Wilson and his colleagues (1938, 1947) and by 
Greenwood himself (1949) showed that chain-binomials were not satisfactory. Greenwood then con- 
sidered another model. 

Suppose that n children are exposed to risk in a household and each child has a probability p of being 
attacked, but that these probabilities are not independent but correlated. The simplest assumption is 
that all these correlations are equal, say 7. Greenwood saw that the variance of the number attacked 
would be npq{1+(n—1)r}. This result is easily obtained, for if 

xy, = 1 if uth child is attacked, 
= 0 if uth child is not attacked, 
o% = pq for all u, 
n 
we require the variance of Dg 
u=1 
n 
and this is > o24+ S ro,o, = npqg+n(n—1) rpg 
u=1 u+v 
= npg(1+n—Ir). 
He asked me whether one could obtain the frequency distribution of the number attacked ? The following 
solution of the problem is simpler than the one I originally obtained. 

Let P,(k|r) be the probability that when x, = 1, out of r other children k will be attacked. Let P,(k| 1) 
be the probability of the same occurrence when x, = 0. Let P(k,n) be the probability that k out of 
n are attacked. Then P(k,n) = pP,(k—1|n—1)+¢P,(k|n—1). (1) 
Thus if we can obtain P,(0| 1), P,(0| 1), P,(1| 1), Po(1| 1} the distribution can be built up. 

Consider x, and x,, and suppose that their joint frequency distribution is 


Xe 





q P| 
The regression of x, on 2, is r, for the two standard deviations are equal. When x, = 0 the mean value 
of x2 is Po;/¢, when x, = | it is p,,/p. Thus 
(Pu:/P)—(Po/g) = 7 
and PutPo = P- 
Thus Pu =P =PL-r), Po=Q+Par, Pu =p *+par, 
and the frequency distribution is 
Xy 
0 1 | 
ete Ae 
1 pai-r) = p*+pqr | 





q Pp 


P,(1|1) =p+¢qr, | 

P(1|1) = p(1—r),; (3) 
P,(0| 1) =q(1-r), 

P,(0|1) =q+pr. | 


q 
p 
1 


Thus 


? 
f 


(3) 
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Now consider the ordinary hypergeometric distribution, whose generating function is 








Ni 
(N9\w Hn, — Np, Nq—n+1,A). 
Niny 
It satisfies the recurrence formula (1) and 
Np-1 ) 
bz t 
P\1|1)= 3 
Np 
r, = ——., 
(1]1) = 
§ (4) 
Nq 
P l= J 
0] =e 
Nq-1i 
£9! See 
PAO) =") 
; P ; 1 : ae =? ae 
(4) and (3) may be identified by taking r = -Wi)’ but r is positive, hence N = — es which is 


negative. In the usual case of the hypergeometric distribution, N is a positive integer ; here it is negative 
and not necessarily an integer. Thus the distribution is 


(NQ)im 
Nin 


with fo (=) : 


r 


F(-n, —Np, Nq—n+1,A) (5) 


For example, for n = 2 we find 








Naq(Nq—1) 2NpA Np(Np—1) “| 
N(N—1) 1!(Nq—1) Ng(Nq—-}1) }’ 
and this easily reduces to 





2p(1—r) A reeritriera| (6) 


(g+7rp) 1+ gar 
eaten fi oy gag 
and the three frequencies are 


0 g*+pq9r, 
1 2pq( 1— r), 
2 p*+pqr, 


as is obvious from (2). 
For n = 3 we find 


3pq(q+rp)(1—r)/(1+7), (7) 
3pq(p+rq)(1—r)/(1+7r), | 
p(pt+ar)(p+art+r)/(1+7). 


a¢+rp)(q+rp+r)/(1+r), | 


onwnrem © 


The case when N is negative but an integer arises in the ordinary model of drawing balls from a bag, 
when after each ball is drawn it is replaced and an additional ball of the same colour inserted. However, 
the present case is much more general than that. 

We might change our model somewhat taking a definite period 7' of exposure to risk and dividing it 
into n equal intervals (n large). We can suppose that a single case may occur in any interval with 
probability p, but that these probabilities are correlated with correlation r as before. Then the dis- 
tribution (5) will hold. Now put np = m, nr = a, and let n->00, p>0, r +0, in such a way that m and 
« remain finite. In this case the generating function of the distribution becomes 


A \~ma 
(1+a)-™e (.- 25) : 


or the negative binomial distribution. The simplest way to show this is to consider the factorial moments. 
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Greenwood (1949) fitted the distribution (7) to 100 sets of three cases (Wilson and Worcester’s data). 
p was estimated by equating 3p to the observed mean and npq(1 +n -— Ir) to the variance. The result is: 





Obs. Exp. 

4 4-44 x? = 0-331 
11 9-68 ( v=1 ) 
18 19-32 P = 0-57 
67 66-56 

100 100 


This is a good fit, but is the only example available so far of fitting actual data with this distribution. 
Naturally much more experience is needed before we can conclude that it is really satisfactory. 
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A note on contagious distributions 


By H. R. THOMPSON 
Applied Mathematics Laboratory, D.S.I.R., Wellington, New Zealand 


1. Contagious distributions have been applied to ecological problems connected with the spread and 
distribution of plants over an area, where we have initial or ‘parent’ plants, and others (‘offspring’) 
associated with them in such a way as to render the use of the simple Poisson distribution as a description 
of the spreading process invalid. Earlier work in ecology was based on the assumption of a random 
distribution of plants, but this has generally been found to be an erroneous assumption, due often to the 
associative properties of neighbouring plants. Random samples of quadrats showed that in comparison 
with Poisson expectation there was an excess of plots with no plants and with a large number, a property 
described as ‘contagion’. Neyman (1939) developed distributions with this property in the fields of 
bacteriology and entomology, but they are equally applicable to ecology. Thomas (1949) invented the 
double Poisson distribution to describe non-random distributions of plants. This note discusses another 
method of obtaining contagious distributions, due to Darwin (1951), of wider applicability, and draws 
attention to some shortcomings of the two earlier methods. Since this was written, a paper by Skellam 
(1952) has appeared, in which similar aspects of the problem have been discussed, but as the methods 
used here are different it was considered worth-while to present them without modification. 


2. The derivation of Darwin’s model is more formal than that of Neyman, and as far as possible 
a notation consistent with his is employed. Assume a random distribution of parent plants or centres 
with mean m per unit area, an assumption consistent on the whole with practical experience when an 
area is first invaded by a species. Taking an arbitrary origin, and referring co-ordinates of plants, which 
we assume to be dimensionless, to this origin, then for a parent at (£,7) the co-ordinate distances 
x—,y—y7 of an offspring at (x,y) obey a probability distribution f(2—£,y—7)dady of being in the 
infinitesimal rectangle, area dxdy, left-hand lower corner at (x, y). Then for the test region R (which will 
usually be a quadrat, of side h, say) 


Pr {1 offspring in R | centre at (£,7)} = i} f(x—&,y—4) dxdy 
R 


= P(é,7) say. 


For a group, centre (£,7), with n offspring, the probability generating function (p.g.f.) of the number 
in R (assuring individuals of a group to act independently) is 


(P(E, 9)z+1—P(E,%))", 


a). 


on. 


-76. 
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the average value of which, given the probability distribution p(n) of the number n of offspring per 
oup, is 
ay a L(n)(Pz+1—P)" = fle) say. 
n 


Multiplying by the probability law of £ and 7 and integrating over the whole area with respect to £ and 7, 
Darwin obtains the p.g.f. for the number of offspring in R due to all centres, assuming all areas to 
contribute independently to the number in R. 


Now let Pr(l centre in d&dy) = mdédy, 
Pr(no centre in d§dy) = 1—mdédy, 
Pr(>1 centre in d£dy) = o(d&dy). 


Therefore the contribution to the p.g.f. from the small area dé dy at (£,7) is, by the ordinary multiplica- 
tion law of probabilities, 1+ m(f(z)—1) dé dn 


(since the p.g.f. of the number in R when there is no centre at (£, 7) is 1). 
Taking logarithms to the first order in d£dy log p.g.f. due to the small area d£d7 is 


m(f(z)—1)d&dy, 


and hence log p.g.f. for the whole area F is, since centres are assumed to act independently, 
mse) 10a. 
F 


In this, p(n) and P(g, 7) are arbitrary. Actually, f(«—£, y—7) should be specified and P(£, 7) calculated, 
but since f(z — £, y—7) is also arbitrary, it is formally only necessary to specify P(£, 7). A valid assumption 


is that it is a circularly symmetric function P(r), so that transforming to polar co-ordinates and inte- 
grating over 6, 


log p.g.f. = [2mm fte)— wr, (1) 
0 


o 
in which | 2nP(r)rdr = 1. (Note that this gives the distribution of the offspring alone. The p.g.f. for 
0 


centres and offspring combined would be obtained by multiplying the above p.g.f., 7(z) say, by the 
Poisson p.g.f. exp {m(z— 1)}. Moments are most easily obtained from the component cumulant generating 
functions, which are additive.) 

That the above derivation is entirely equivalent to Neyman’s model is seen by putting p(m) = e~AA"/n!, 


P(&,7) = A- inside a given finite area A and 0 outside, the values used by Neyman to obtain his Type A 
contagious distribution. Then 


f(z) -—1L=e4-YA_—], loga(z) = mA[er*-V/4 — 1]. 


Darwin’s model gives a much wider range of distributions, however, because functions P(£,7) can be 
used in which dispersal is theoretically possible over the whole field. Such a function is the two-dimen- 
sional Gaussian e~#"!0*/(270*), which might be considered an adequate approximation for describing 
some dispersal mechanisms. The form of Neyman’s derivation limits the possibilities for P(g,7) io 
dispersal inside a finite and fairly small area, with none outside. For, putting A, the area of the part of 
the field in which P(£,7)>0, equal to M, the total area of the field, in Neyman’s equations, we are led 

(X\N 


to a limit of the form lim i 


—| for the p.g.f., which equals zero. 
N>o \N 


3. Darwin used his model to obtain a new method of generating the negative binomial distribution. 
Take 


P(r) = 





-tr 0" =(l- n = 6, I, .:.). 
Snot” » p(n)=(1—p)p" (n ) 
Then integration of (1) gives 
m(z) = (1—A(z—1))*, 
where a=2no%m, £ = p/270%(1—p). 
In practice we would wish to‘ use e~#”/*"/(270*) to represent the distribution of distances between the 
plants themselves, i.e. have 


f(e—£,y—) = ete" (200%) (rr? = (w—£)* + (y—7)*). 
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The negative binomial is therefore an approximation, as it is necessary for P(£, 4) to equal 
e-it*lo* (200%) dEdn 


(r? = £24 ?, assuming the origin to be at the centre of R) for all points in R. It would appear therefore 
that R must be fairly small in relation to o? to give a reasonable approximation. In fact, the approxima- 
tion is fairly good so long as h/o is not greater than about 4. 


4. We proceed to obtain from Darwin’s model some examples of contagious distributions by sub- 
stituting arbitrary functions for p(n) and P(r) in (1). The differing degrees of contagion can be compared 
by means of the index of dispersion /,/“;, or in other words by comparing second moments when the 
means are adjusted to be equal. This method is used in preference to a consideration of individual 
probabilities of occurrence (79, 71,72, -.-) a8 giving a better overall idea of the situation. The following 


N 
four common distributions for p(n) are used: Poisson [e~4A"/n!], binomial [ (x ) ptq’-*,n = 0,1,..., v|. 
geometric [(1—,) ~"], and negative binomial [(1—«)?(~+1)a"]. Further, for P(r) we take the forms: 


1 


i PE e-t”ic* 
270? 


1 


a e-Te 


3 6 
one? ——- (Bf), ———#(R—f), 


nR* 7R* 


? > 


(the last three inside a circle of radius R, and zero outside). We then obtain by integration exact forms 


for the p.g.f.’s, but except for the example given above, none of the others gives rise to common dis- 
tributions. They are mostly rather complicated. 


Table 1. Values of the index of dispersion p./n{(u{ = mA) 






























































P(r) 
p(n) E(n)=A (1) (2) (3) (4) (5) 
1 3 6 1 
—_. e-irig* -r == Si ae es £3 te 
Qnot® «| Bot” = 7R® ate mR eo mR? 
ros yu d x | 3 12A 2a 
(1) ls) # l-yz 1+ F758 1+ Got 1+ oR 1+ Sart 1+ oR 
2a 3A 3A 9A 9A 3A 
—a)? n oi ei ciraa sarees 
mar fore Ae l-a 1+ Sr08 1+ Tenet 1+ TRA 1+ iam 1+ oyR? 
e-A An A A 3A tT 6A A * 
8) A | Taree | tart | tone | tier | tame 
N A-p A-p 3(A —p) 6(A—p) A-p 
ngN-n NY. A—Pp 
(4) (1) Pa ‘7 a% 4no* * 8707 i 27k? af 57 R? at nk? 
* Neyman’s Type A distribution. + Neyman’s Type C distribution. 


Except between the last three columns there is no strict basis of comparison between the indices of 
dispersion as they stand, since the first two functions permit theoretically infinite spreading. Of course 
comparisons within any column can be made, from which it follows that the relative degrees of con- 
tagion for the final distributions are in the same order as for the p(n), the indices of dispersion for which 
are respectively 1+A, 1+ 4A, 1, 1—p. With the distributions in the last three columns there is a definite 
‘clump’ size with area equal to 7-R?, the size of the circle outside which no members of a group fall. For 
the first two we can only specify some area containing a given percentage of the distribution as the clump 
size, and in the table below the indices of dispersion have been made comparable by taking as the clump 
size the area A containing on the average 95 % of the distribution. The values of A for the five P(r) are 
approximately 5-99707, 22-5070, 0-757 R?, 0-817 R? and 0-957 R?. 

A reasonable conclusion to be drawn from this modified table is that for various types of spreading 
mechanisms the resulting distributions all have approximately the same degree of contagion, or in other 
words, there will not be much difference in the forms of the distributions and the differences might even 
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be undetectable from samples. The only possible exception occurs in column (2), and this may be due 


to the exponential form of P(r), with its slow falling off in frequency and consequent difficulty in defining 
any sort of clump size. 


Table 2. Modified values of the index of dispersion p/p{ 





P(r) 
p(n) 


(1) 


(2) 


(3) 


(4) 


(5) 





(1) 
(2) 
(3) 


1+3-00A/A 
14+2-25A/A 
14+1-50A/A 
1+1-50(A—p)/A 


1+4+5-63A/A 
14+4:22A/4 
1+2-81A/A 
14+2:81(A—p)/A 


14+2-24A/A 
14+1-68A/A 
14+1-12A/A 
14+1-12(A—p)/A 


1+41-95A/A 
14+1-47A/A 
1+0-98A/A 
1+40-98(A — p)/A 


1+41-90A/A 
141-43A/A 
14+0-95A/A 
14+0-95(A—p)/A 


























1 (4) 


We may note in connexion with similarity of distributions that this is further borne out by a study of 
higher moments. For example, with the distributions in row (1), column (3) and row (2), column (1), 
which have almost identical y./u;, we have the following values for 5/uj and p4//3: 
fs/#, = 1+ 6-73A/A + 9-06(A/A)?, 

Pali = 14+ 15°70A/A + 54-34(A/A)? + 54-17(A/A)3; 
s/t, = 14+ 6°74A/A + 8-97(A/A)?, 
My] y = 14+ 15-73A/A + 53-85(A/A)? + 50-41(A/A)?3. 


a ra, an{ 


lis- 


[(2), a, 


5. Thomas’s double Poisson distribution has mean m(1+A) and variance m(1+3A+A2) for the 
ry distribution of parents and offspring, so that for the offspring only wy = mA, p./u4i = 3+A. In general 
therefore the degree of contagion will tend to be higher than for the distributions above if A is more 
than 1 quadrat. This arises because Thomas assumes that all offspring of a group fal! in the same quadrat 
as their parent, which is obviously an unrealistic assumption whatever the dispersal mechanism because 
of edge effects between neighbouring quadrats. It is possiMe with the radial Gaussian model to calculate 
the theoretical correlations between the numbers of offspring in neighbouring quadrats. (The basic 
theory enabling this calculation will be presented more fully in a later paper which will consider the 
general relations between plants in the quadrats of a grid superimposed upon the sampling region.) 
This is done in Table 3 for the correlations p), between adjacent quadrats in the same row (or equivalently, 
column), and p,, between diagonally adjacent quadrats, with A = 3 (the value of m is immaterial) and 
) different values of h?/o?. (p(n) is assumed to follow the Poisson distribution.) 








23 Table 3 
* hijo | 1 2 4 . 
2 } 

rani 18-8h? 4-Th® 1-2h? 0-3h? 
4 Pu 0-143 0-198 0-119 0-057 
a Pu 0-114 0-095 0-023 0-005 


Thus there is still an appreciable boundary effect due to averaging even if most of the group is expected 
to fall within one quadrat; so that even if Thomas’s distribution does give adequate fits to data, it is at 
best only an approximation to the facts and cannot be regarded as representing a very precise model for 


es of ' the spread of plants. 
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A general expression for the mean in a simple stochastic epidemic 
By H. W. HASKEY, South-east Essex Technical College 


1. A community contains n susceptibles into which a single infectious individual is introduced and 
a mild infection spreads by contact between its members. It is supposed that no cases are withdrawn 
by isolation or death and that no case becomes clear of infection during the main part of the epidemic. 

If p,(t) is the probability that there are r susceptibles still uninfected at time t, Bailey (1950) gives the 
following equations for the epidemic process: 


d 

Spf = (r+1)p,.,(t)—rp,(t) (r= 0,1,2,...(n—1)), 

dp,(t 2 
and wal = —np,(¢), 


where r = r(n—7r+1) =n—r+1. He shows that the rate of change of the mean number of infected at 
time ¢ in this stochastic process is 


Lirf(1)é+rg(1)—fe1)pe™, (2) 
r 
where f}(x), g;(x) are the derivatives with respect to x of polynomials /,(x), g,(z) satisfying the equations 
2(1—2x) fe (x) —n(1—2x) f-(z) —rf,(x) = 0, (3) 
a(1—2x) gf (x) —n(1—2) g(x) —rg,(x) = —f,(x). (4) 


Bailey obtained an expression for g}(1) but not for g}(1), and was unable to obtain a convenient one for 
g(x). If a series solution of (4) is attempted for n = 30 the coefficients of terms in the fifteen series 
required are often of the order of 10'*, and as the value of one coefficient is used to determine the next, 
it is impracticable to evaluate g}(1)(r = 1,2,...,15) accurately enough with a machine of ordinary 
capacity. 

The object of this paper is to obtain a general formula for g}(1) for nm even or odd, by means of which 
the rates of change of the mean number of infected for n = 30 at various times are computed more 
accurately and readily. 


2. Bailey uses the Laplace transform and its inverse with respect to time given by 


q(A) = [oempatnae (R(A)>0), 
7 (5) 


c+io 


where c is positive and greater than the abscissae of all the residues. The transforms of the equations (1) 
are taken, using the boundary conditions p,(0) = lifr = n and p,(0) = 0ifr+n, leading to recurrence 
relations between the q’s and thence to the values, for even n 








vt n!i(n—s)! (dn>s>0) 6 
ds = Si(A +1) (A+2)...(A+8—1){(A+8)(A+841)...(A+ mm)” - ; (6) 
4 n!i(n—s)! 7 
4 = S1(A+1)(A+2)...(A+n—s+#1) Gar (7) 

The transform of the probability generating function 

n 

I(x, t) = 2 x*p,(t) (8) 

s=0 

is II*(x,A) = ¥ 2'g,(), 
s=0 


which by (6) and (7), can be written, for n even, 








— ya, Bf fle), 92) 
II*(z,A) =A + {es SI, (9) 


(8) 


(9) 
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where f,(z), g-(x) are the sums of coefficients of 1/(A +r)*, 1/(A +1) respectively in partial fraction expan- 
sions of X x'g,(A). Bailey has shown that f,(x), g,(x). satisfy (3) and (4) and that 
8 


n!(n—2r+1)? 








71) = ————_————_ E 
fr) je—rlr— Di ore Odd 
Still taking n even, we write 
Fd A, i” By 
= + 28), 
@= 2 Ark) ask "> (10) 
n—st+1 A, 
= > . 
a= 2 aan @>i) (11) 
n—r+1 
and then g(x) = x A,,z', 
s= 
ie n—-r+1 
giving g(1) = xu 8A,,. (12) 
8= 


We proceed to find the A,, and use (12). 


3. Multiply (6) and (10) by (A+r) and then put A = —r. On simplification it is found that 
Pee (—)*-tn! (n—8)!(s—r—1)!(n—2r+1) 
-.. 8!(r—1)!(n—r)!(n+-1—r—s)! 





(4n>s>r). (13) 


Treating (7) and (11) similarly the same result, (13), is obtained for n—r+12>s>4n. From (7), A, = 0 
ifs>n—r+1. When $n2>s with r>s, there are squared factors in the denominator of (6), and it is seen 
that 


a 
A, = aa {g.(A +r)"}/,--2- 





° 1 1 1 1 
Notas at s-r acxealentan cal saan 
it is found that 
n—?T n—r+s+1 
(—)*n!(n—s)!(n—2r+1){¥ pt+ > pt—2/(n—2r+1)| 
A, = ene_por-et? : (15) 
‘8 s!(r—1)!(n—r)!(n+1—r—s)!{r—s)! 





In (15) the first sum does not arise when s = 1 and the second does not arise if s = $n. 
Equations (13) and (15) give, for even n, all types of term on the right of (12). 


4. From (13) the contribution to g}(1) due to terms for which s>r is 
m'(n—2r+1)"-2r+1 = (n—r—y)!(y—-1)! 














= (—)j-l wean 4b 16 
B= (YG —mle— I Ay Yer—Dint1—2r—y)l (16) 
where s—r = y. The general term in (16) can be put in partial fractions so that 
(—)rt(n—r—y)'(y—D)! _ Sh ee (17) 
(ytr—1)!(n+1—2r—y)!  gooytk’ 
(—)rt#-l(n—r+k)! 
= : 18 
— k= (n+ 1—2r+k)!(r—k—1)tk! sit 
r= 
Writing the terms of >) c, in the reverse order it is seen that 
k=0 
(n—r)!(r—1)17} 1 (n—r)O(r—1)  (n—r)™(r—1)” 
(n—1)!! fo (n—1)1! (n—1)2! 
= F{—(r—1), —(n—1); —(n—1),}} 
_ (n—r)\(r—1)! 
~ (n—D)! 
r= 
Hence > ¢, = 1. (19) 
k=0 
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r 
5. The other contribution to gj(1) due to termas for which s<ris >) 8A,,, where sA,,, from (15) can, 


s=1 
from (18), be written n—r—s+1 
UVCp.g+Cyr., D ps 
p=r—st+1 
a-r 
where v= D> p?—-2/(n—2r4+1), w= —n!(n—2r4+1)/(n—r)!(r—1)!. 
p=r 


Writing out the values of this for s = 1, 2,...,7 on separate lines as follows 
UVC,_y + UC,_,[1/r + 1/(r +1) 4+... +1/(n—7)], 
UVCp_9 + UC,;_o[1/(r —1) + 1/r+...+1/(n—r—1)], 


UC, + Uc,[1/1+44+...+1/(n—2r+1)], 


and summing by columns there results 








3 r-1/¢ c c 
» ¥ b> > ee ee 
wSatus (i; 2+kh* tos aa) 
n—2r+1r—1 
or by (19), wtu Dd} DSe/yt+k) (20) 
y=1 k=0 
or uv — (21) 
by (16) and (17). Then g/(1), sum of (16) and (20), is by (21) 
ane... eee {2 (n—2 vz =| (r = 1,2,...54 (22) 
(1) = - —(n—2r+ — 7 = 1,2, 3.3, 40). 22 
9e(1) (n—r)!(r—1)! p=rP 


With n odd the foregoing method applies but there are modifications, e.g. 
ax n!(n—<s)! (*F >), 
8!(A+1)(A+2)... A+9—1)(A+s)A+s+1) — (a+*>)| (a+*5*) 
and (15) is replaced by 





Ss 1 ty 1 4 
—)*n!(n—s)! n—2r+1){ —+2 ae eee 4 
~ ; vi k=1U0 k=s,ker¥ (n--2r+]1) 


Ay = a!(r—1)!(n—r)!(r—s)!(n + 1—r—8)! 





for 3(n—1)>r>s8, where 1/v = 1/(k—r)+1/(n+1—k—r). Equation (13) applies if s>r and, when 
r—1 
r = 3(n+1) asummation analagous to that by which }) c, was obtained is needed. When s>}4(n+ 1), 


(7) holds so that ffn41) (1) is zero. It is found that, for n odd, 
n! n—r] 
(1) = —————__{2-(n-2r4+1) 5 —} (r=1,2,....34(n—1)), (23 
gil) = ——— [2—(n- ar "E 5| ( Kn-1)) ) 
n! 
~ {f4(n— 1)’ 
which is only half that given by the general formulae (22) and (23). 
Thus the Stochastic mean number uninfected at time t is 


but Jyn+1 (1) 


n! 2 2 y's rt 
———______ {(n — 9 + 1)2¢4.2—(n—2r+ —}e-, 
axa pti ait Pe 


In this formula r = 1, 2, ..., 4n if n is even, but if n is odd r = 1, 2,..., (m+ 1) with the special end-value 
given above. 


6. The transform of (9) gives i 
Tl(x,t) = 1+ Dd {tf,(x) +9,(x)} e-**, (24) 
r=1 


where | = 4m is n if even and / = }(n+1) if n is odd. Hence the rate of change of the mean number of 

@ Il 

re | , the vaiue of which is given by (2). For n = 30, the values of 
c |} @=ul 

rf;(1) accurately and —rg}(1)+,f, (1) to three decimal places are given in Table 1. 


infected at time ¢ is given by — 


am 
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Table 1 
n, 
r rf,(1) {—r97(1) +f-(1)} x 10° 
1 756900 126829916 
2 36785340 4521346367 
3 639450000 66734940835 
4 6262809840 584644959146 
‘ 5 40849344900 3499281732477 
6 192917497500 15372766370623 
' 7 691895131200 51600518480155 
8 1938488760000 135195958672714 
9 4308712679700 278181087951791 
10 7634438311500 443723391803482 
11 10708043346000 520480237463232 
12 11595627716400 364795469521989 
13 9107736592500 77034391329189 
14 4360935177900 659179172373612 
20) : 15 558423072000 1077291176400000 
21) 
Chambers’s shorter tables were used to give e~* to six decimal places enabling z, the rate of change 
of the mean number of infected, to be found to the nearest integer for times > 0-08. 
22 
we ; Table 2 Stochastic treatment, n = 30 
: Time 0 0-05 0-06 0-08 0-1 @-12 0-12 0-15 0-2 0-3 0-4 
z 30 104 118 152 176 183 183 150 90 12 1 
The deterministic expression for z is 
n(n + 1)? +H fn, + ent Dt}2, 
giving @ maximum of }(n+ 1)? at time log,n/(n +1). 
Table 3. Deterministic treatment, n = 30 
Time 0-005 0-01 0-02 0-03 0-05 0-06 0-07 0-09 0-1 0-12 0:13 0-15 0:18 0-2 0-25 0-28 0-1097 
hen z 35 40 53 69 113 140 168 219 235 234 218 166 94 52 12 4 240} 
+1), 


The deterministic and stochastic curves for z (n == 30) show the same characteristics as thoseforn = 10, | 
20; their maxima occur at about the same time, less than that for n = 20, and the difference between 
the maxima exceeds that for n = 20. 


(23) REFERENCE 
Barmgy, N. T. J. (1950). Biometrika, 37, 193. 


Some remarks on confidence or fiducial limits 
By THEODORE E. STERNE 
voles j Ballistic Research Laboratories, Aberdeen Proving Ground, Maryland 


If the probability of a ‘success’ is 7, then the probability that there will be precisely r successes during 
n trials is exactly 


24) — n! ae aa 

ae | Pal) = ym) (1) 
per of Therefore, the probability of obtaining a number of successes as probable as or less probable than 
1es of & particular number of successes s is equal to 


P(r, n,8) = LPas(7), (2) 
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where the summation is over all values of r for which 


Pn,(™) <Pn, (7). (3) 


‘The probability of obtaining a value of P equal to or less than some particular ard possible value, P’, of 
P is equal to P’. From the preceding considerations, confidence or fiducial limits of 7 can be chosen. 
corresponding to any values of m and s and to any desired value, ¢ lying between 0 and 1, of P. 

Consider the dependence of P(7,n,s8) upon the variable 7, for any value of s other than 0 or n. For 
values of s differing from 0 and % it is clear that P(0,n, 8) and P(1,n, s) are zero, and that there are some 
values of 7, between 0 and 1, for which P(7, n, s) is unity. The values of 7 for which P(7, n, s) is unity are 
values for which 7,, ,(77) is the largest of all terms p,, ,(7). It can readily be shown that there are exactly 
n values of 7 between 0 and 1 at which P(7, n, s) is discontinuous. Such discontinuities correspond to the 
‘crossing’ of p, ,(7) by other p, ,(7)’s, and except at the n discontinuities P(7,n,s) is a continuous 
function of 7. When s is 0, the situation is somewhat different since then P(0, n, 0) is unity while P(1,n, 0) 
is zero. When s equals n, P(0,n, n) is zero while P(1, n,n) is unity. Whatever may be the value of s, there 
are just 7 values of 7 at which P(7, n, 8) is discontinuous. 

The preceding description of some of the properties of the P-function has been included to clarify, in 
the reader’s mind, the nature of the dependence of P(7, n, 8) upon 7. To select confidence or fiducial limits, 


PiST™SPy 
it is sufficient to consider the shortest interval of 7 that contains all the values of 7 for which 
P(7,n,8)>€, (4) 


corresponding to any particular and possible values of n and s, and to some desired value of e. Corre- 
sponding to any possible values n, s and ¢€ it is always possible to find the shortest interval, (;, 2), by 
calculation. The lower limit p, is the smallest of all 7’s satisfying the relation (4), and the upper limit p, 
is the largest of all 7’s satisfying the relation (4). Such limits p, and p, always exist, as will be shown. 
One may note first that when s is zero, p, is zero; and that when s is n, p, is unity. 

The existence of p, and p, follows rigorously from the consideration that the set of 7’s satisfying (4) 
has a lower bound, zero, and must therefore have a greatest lower bound, p,. Similarly, the set of 7’s 
satisfying (4) has an upper bound, unity, and must therefore have a least upper bound, p,. It can 
further be shown that because of the nature of the P-function and of its discontinuities, the numbers 7, 
and p, must themselves satisfy (4). Because of the last-mentioned property of p, and pg, it follows that 


P(m,n, 8) <e (5) 


for all values of 7 lying outside the closed interval p, <7<p,. Let us now arbitrarily choose some value 
of e between 0 and 1 and adopt the policy, whenever in an experiment some value of s has been observed, 
of always asserting that 1 <7™<Py (6) 
where p, and p, correspond to the actual n, the observed 8, and the chosen e. We shall be wrong in our 
assertions when, and only when, 7 lies outside the closed interval (6), in which case P<e by (5). The 
probability that our assertion is incorrect, which is the same as the proportion of times that we shall be 
wrong im such assertions in the long run, is therefore less than e. Therefore the probability that our 
assertion (6) is correct, which is the same as the proportion of times that we shall in the long run be right 
in such assertions, is greater than 1 — ¢. We accordingly call p, and p, the lower and upper 1 —¢ confidence, 
or fiducial, limits. 

In the following tables are listed confidence limits p, and p, for values of € equal to 0-5 and 0-10. These 
limits may be called the 50 and 90 % confidence limits. The tables cover all possible values of s corre- 
sponding to values of n from 1 through 10. 

The author is inclined to prefer confidence limits, selected in accordance with the procedure given in 
this paper to confidence limits selected in accordance with the procedure of Clopper and Pearson (1934). 
Our discussion has considered the probability P(7,n,s) of obtaining a number of successes as probable 
as or less probable than an observed number of successes s. Their discussion considers the probability of 
obtaining a number of successes equal to or less than the observed number and, separately, the pro- 
bability of obtaining a number of successes equal to or greater than the observed number. They select 
an upper confidence limit in such a way as to make the first probability less than or equal to }e, and 
a lower limit in such a way as to make the second probability also less than or equal to }¢. There is, 
however, no value of 7 that can make the second probability less than or equal to }¢ when s is zero, nor 
is there any value of 7 that can make the first probability less than or equal to }¢ when sis n. Consequently, 
the confidence limits selected by Clopper & Pearson are unnecessarily wide when s is either 0 or n, and 
they have in effect over-modestly underestimated the legitimate degree of confidence in their assertions, 


QO -~ =r wo fe 4 
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comparable to (6), for values of s equal to 0 and n. The confidence intervals defined here are certainly 
narrower than theirs when s is 0 or n, and appear to be narrower in other-cases when n is small. The 
advantage of the present limits over those of Clopper & Pearson is probably small when n is large. The 
present intervals are not ‘central’, in the sense in which the intervals of Clopper & Pearson are central, 
but they are perhaps more nearly ‘central’ than those of Clopper & Pearson with respect to the pro- 
babilities, P(7, n, 8), of obtaining results s as probable as or less probable than some observed s, regardless 
of whether such less probable s’s are larger or smaller than the observed s. 


Summary. In a binomial problem, confidence limits of an unknown parent probability can be 
selected on the basis of the probability, P, of obtaining a number of successes as probable as, or less 
probable than, the observed number s of successes in n trials. In comparison with confidence intervals 


Table giving confidence limits for m for samples of size n 





















































50% 90% 50% 90% 
. | 8 
Pi Ps Pi Ps Pi Pa Pi Pa 
n=7 
n=l 
0 | 0-000 | 018 0-000 0-35 
0 | 0-00 | 0-50 | 0-00 | 0-90 1 -094 31 -015 50 
1 -50 1-00 “10 1-00 2 +18 -44 -079 65 
3 31 -56 17 72 
4 44 -69 -28 83 
n=2 5 56 -82 “35 921 
0 | 0-00 | 0-50 | 0-000 | 0-68 ; 4 = 2 1 ~y 
1 -29 “71 051 -949 
2 | -50 1-00 32 1-00 eu 
7" 0 0-000 | 0-16 0-000 0-31 
| ng l 083 27 013 44 
} 0 0-00 0:37 | 0-000 0:54 : jo 7 Rig . 
l 21 63 | -035 -80 : it r+ 4 = 
| 2 37 “79 -20 -965 5 49 "73 31 85 
| 3 63 100 | -46 1-00 z po bs on 
| 7 | -78 ‘917 -56 987 
| wie 8 | -84 1-00 -69 1-00 
| o 0-00 | 0-29 | 0-000 0-50 n=9 
| 1 ‘16 | +50 | -026 | 68 
2 -29 71 14 “86 0 0-000 0-14 0-000 0:28 
3 -50 “84 -32 -974 1 074 25 -012 39 
4 71 1-00 -50 1-00 2 14 37 -061 52 
3 25 -50 13 61 
4 34 59 21 72 
wie 5 -41 -66 .28 79 
6 -50 | 15 +39 87 
0 | 0-00 0-24 | 0-000 0-40 7 63 86 48 939 
1 18 41 | -021 60 8 15 | -926 61 988 
2 24 “59 ‘ll “715 9 ‘86 | 1-00 “72 1:00 
3 41 76 | 125 | -89 
4 59 87 -40 -979 n= 10 
5 76 1:00 | -60 | 1-00 
0 0-000 | 0-13 0-000 0-25 
1 067 22 -010 40 
n=6 2 13 36 055 50 
3 22 +45 12 60 
0 0-00 0-21 0-000 0-34 4 30 55 “19 66 
l ll +35 | 017 54 5 36 -64 +25 75 
2 21 -50 093 67 6 45 -70 34 81 
3 35 65 | 20 -80 7 55 78 -40 88 
4 -50 19 | -38 -907 8 64 87 -50 945 
5 65 9 | +46 -983 9 78 933 -60 990 
6 79 1-00 | 66 1-00 10 87 1-00 15 1-00 
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selected in accordance with procedures of Clopper & Pearson, the present intervals appear to be some- 
times narrower. Tables are given of the 50 and 90 % confidence limits of the parent probability for values 
of n up to 10. 

REFERENCE 


Cropper, C. J. & Pearson, E. 8. (1934). The use of confidence or fiducial limits illustrated in the case 
of the binomial. Biometrika, 26, 404-13. 


A note on the probability integral of the correlation coefficient 


By B. I. HARLEY, University College, London 


1. Tables of the probability integral of r, the correlation coefficient, have been given by David (1938) 
for the population correlation coefficient p = 0-0(0-1) 0-9 and the sample size n = 3 (1) 25, 50, 100, 200, 
400. In the introduction to the tables it was shown that the normalizing transformation, z = tanh-"r, 
given by R. A. Fisher (1921), provided adequate approximation to the integral for a large range of 
values of p, but that for p in the neighbourhood of 0-9 the results became inaccurate. Recently values of 
the integral were required for p = 0-9 and n between 25 and 50, i.e. beyond the tabular range, and it was 
considered that the z-transformation would not give sufficient accuracy. An examination of the recur- 
rence formulae suggested by F. Garwood (1933) and recently by H. Hotelling (1953) did not promise 
any greater accuracy, and some fresh method of approximation was looked for. 


2. Since the z-transformation is a normalizing one, the distribution of 
l+r 
l-r 





z= slog, 


may be expected to be more nearly normal than that of r, whatever the value of p. Itappearsreasonable, 
therefore, to use the cumulants of z in an expansion such as that given by Edgeworth, since this expansion 
is known to give accurate results in the neighbourhood of the normal. The first four cumulants of z in 
samples of n from a normal bivariate population are found from the values of the moments of z given by 
A.K. Gayen (1951) to be: 











K,(2) = bog, P+ Po 14 aoate \, (1) 
a ees | ; 
k,(z) = ate (3) 
k,(z) = a 52+ a a. (4) 


Substituting these values in the Cornish-Fisher (1937) form of the Edgeworth expansion, taking a 
preliminary mean and variance as l+p 
m= tlog,——_-,_ v = (n—1)-, 
1—p 
we obtain the probability integral of z 


m—K, 


. da = . 2 4 dx + a onto PO Sle —m)* + (K,—v)] 
Fitts i (27) aie 4/(27) vt ip ies : 
H 


— Goi Ka + (x —m)® + B(x, — m) (Kg—¥)] 


EH H 
+ Saal Ka Ky — I) + Ky + (Ky — Im) + B(Keq— 0}? + (KK, — m)? (Kg — 0)] — SS [aly — ™)] 
H H, 
+ Fea [er — 0)? Ky + (Kg) Ka] + rattt +. (5) 
1+R 
where Z= tlog.7—> = m+-2vi, 


and R is the particular value of r for which the probability integral is required. 
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Ky, Kg, Kg, K, are the first four cumulants of z which are tabulated above in equations (1)-(4). 
H,, 7 = 1, 2,...,7 are the Hermite polynomials given by 


dt 
ro sah = H,e-**’, 

H,=-2, H, = x*— 627 +3, 

H,=2'?-1, H, = —25 + 10x* — 15a, 

H,=—2°+32, H,=—2?+21a25—105c° + 1052. 
The expansion includes terms up to order n-*. 


3. For the purpose of testing the accuracy of the formula we took p = 0-9 and n = 25 and 50. The 
value of the probability integral of r, obtained by using the normal z-transformation and the values of 
k,(z) and x,(z) given in equations (1) and (2), was also calculated. Some results of calculations are given 
in Table 1. 


Table 1. Comparison of exact and approximate values of the probability integral of r 
(i) n = 25, p= 0-9. 








Approximate Approximate 
r Exact value value from value from 

equation (5) z-transformation 
0-75 0-00 743 0-00 742 0-00 698 
0-82 0-05 574 0-05 578 0-05 589 
0-84 0-09 859 0-09 862 0-09 981 
0:87 0-22 387 0-22 379 0-22 576 
0-90 0-46 244 0-46 247 0-46 250 
0-93 0-78 645 0-78 661 0-78 442 
0-95 0-94 612 0-94 612 0-94 609 
0-965 0-99 263 0-99 264 0-99 325 











(ii) n = 50, p = 0-9. 








Approximate Approximate 
r Exact value value from value from 
equation (5) z-transformation 
0-82 0-01 285 0-01 286 0-01 265 
0-85 0-05 998 0-06 001 0-06 024 
0-88 0-23 202 0-23 200 0-23 294 
0-90 0-47 403 0-47 404 0-47 405 
0-92 0-77 108 0-77 112 0-77 009 
0-93 0-88 871 0-88 872 0-88 814 
0-95 0-99 174 0-99 174 0-99 204 





























It ~vill be noted that the agreement between the value from equation (5) and the exact value from the 
tables is even better for n = 50 than for n = 25, and the results suggest that the formula can be used 
with confidence for n between 25 and 50. 


4, The inverse Cornish-Fisher expansion can be used to obtain the significance levels for any n. 
Inverting (5) we have for standardized z, say z’, 
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where £, is the unit normal deviate cutting off 100% from the upper tail. K,, Kg, Ks, K, are the first four 


cumulants of z. The terms are arranged in descending order in n, and only those up to order n~* have 
been retained. 


For n = 25, p = 0-9 we obtain, by this means, a value of 1-83965 for the 5 % significance level of z, 


or 0-95076 for r, a value whose accuracy could probably not be reached by interpolation in David’s 
tables. 


For n = 50, p = 0-9 the 5 % significance level is 1-72055 for z, and 0-93793 for r. 
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A simplified expression for the variance of the x?-function on a contingency table 
By REED B. DAWSON, Jr., U.S. Department of Defense 


Haldane (1940) considers the (m x n)-fold contingency table 
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Then the variance may be found from the formula 
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whose identity with the formula (4) given by Haldane may be verified directly. Note that whenever the 
row (or column) sums are equal o (or 7) vanishes, eliminating the second term. 
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REVIEWS 


Elementary Medical Statistics. By D. Maryann. London: W. B. Saunders Company 
Ltd. 1952. Pp. 327. 25s. 


The term ‘statistics’ is even less precisely defined than usual when prefaced by the epithet ‘medical’. 
Thus, a ‘medical statistician’ may be concerned with the application of standard biometric techniques 
in clinical and laboratory research; he may be engaged in the study of large-scale mortality and 
morbidity statistics, where sampling errors are usually unimportant; or he may have the largely 
administrative task of organizing and working a hospital records system. This book deals mainly with 
the first field of activity, and will interest medical research workers who have felt the need for a non- 
mathematical book on elementary statistical methods with examples drawn from the medical sphere. 

It is gratifying to find shrewd discussions on a number of problems which occur in medical statistics, 
and for which the research worker will find little guidance in the standard books. In clinical medicine, 
from which most of the examples are drawn, rigorous experimental methods are often, though not 
always, impossible, and Prof. Mainland carefully points out the dangers inherent in the interpretation 
of fortuitously collected data. He has some wise words, too, on the meaning (or lack of meaning) of 
so-called ‘normal’ values, and on errors in routine procedures like blood counting. His short sections 
on vital statistics, however, are less happy; the definitions of stillbirth and maternal mortality rates 
given on p. 36 do not apply in this country (where the denominator in each rate should be total births, 
live and still), and the expectation of life is wrongly interpreted on p. 255. 

The treatment of standard statistical methodology, although sound, is unfortunately marred by 
@ curiously disconnected exposition. A very good introductory section is followed by a chapter ‘On 
looking at evidence’, in which a bewildering number of hares are chased and hastily dropped. Then 
follow 100 pages nominally on enumeration data, but in which is included an excellent discussion of 
the role of randomization in experimental design. Also included here is an exposition of confidence 
limits and significance tests for the binomial distribution—in that order, surprisingly; the usual 
difficulty in understanding confidence estimation is likely to be increased when the methods are first 
illustrated for a discrete distribution like the binomial. Two chapters on measurements, including 
a detailed treatment of the t-test, are followed by a chapter on statistical ideas in clinical practice, 
some features of which have already been mentioned, and a final chapter briefly introducing one-way 
and two-way analysis of variance, regression and correlation, probit and other transformations, and 
many other topics. There is a great deal of repetition throughout the book. Hints are constantly 
dropped about matters which are to be discussed in more detail later, and the reader is frequently 
referred to forthcoming sections. 

These faults make the book unsuitable as a primer in statistical methods, but it would be a pity if 
they prevented the medical research worker from reading the many excellent sections in it. He would 
benefit most from this book, if, before reading it, he first obtained a grounding in statistical methods 
from some other elementary text. 


A feature of the book which should interest the general statistician is the extensive series of tables 


and charts for binomial confidence limits. P. ARMITAGE 


A Statistical Primer. By .*. N. Davrp. London: Charles Griffin and Co. Ltd. 1953. 
Pp. x+ 226, 22s. 


This book is intended for the non-mathematical research worker who wishes to learn some statistics, 
enough to enable him to perform ordinary tests and to design satisfactory experiments without other 
expert help. 

The non-mathematician is a timid creature, easily scared by the sudden appearance of an algebraic 
formula, unable to trust even the clearest mathematical demonstration ; but quite able to ‘get the idea’ 
if it is suitably presented to him. It is fascinating to watch how carefully the author stalks her prey 
in the excellently written early chapters of this work, avoiding any sudden movement that might 
cause him to fly off in alarm. But alas, statistics is really.quite a difficult subject, and soon the reader 
is grappling with a straightforward, and therefore rather difficul., chapter on moments. 

The book then develops in a systematic way the various tests of significance—of a single mean and 
variance, then of paired means and variances. There is a wealth of numerical illustration to make 
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calculation more certain, and the author takes special pains over a number of points which often give 
worry to a beginner—for example, the question of one or two tails and of (N—1) instead of N in 
a denominator (though the frightening expression ‘degrees of freedom’ is introduced suddenly and 
unceremoniously !). 

Although the emphasis throughout is on elementary principles it is noticeable that there are no 
half-truths. Exposition is always rigorous with laudable emphasis on basic assumptions; for example, 
the effects of non-normality are described in connexion with each statistical test. 

The final chapters deal with the analysis of variance (ina very clear and simple way), with the binomial 
and Poisson distributions and with the x? test. Since all the other familiar tests have been covered it is 
somewhat surprising to find no mention of the correlation coefficient. 

Short tables of the important statistical functions are included so this book can provide the complete 
armoury required for simple statistical work in the laboratory. D. RB. WILKIE 


Statistical Methods in Electrical Engineering. By D. A. Bett. London: Chapman 
and Hall. 1953. Pp. viii+ 175. 25s. 


It is refreshing to find the author of a book of this type remarking: ‘Since one of the difficulties of t*1e 
beginner in statistics is that most of the introductory books are written for specialist fields. . .it seems 
at first a retrograde step to write another book “for engineers”.’ Such an admission should be 
treasured by those who believe that many difficulties in statistics arise from premature concern with 
specific applications before basic theory has been thoroughly assimilated. In the present book, how- 
ever, a good case is made out for specialization, on account of the particular theoretical developments 
of interest to the electrical engineer which find little or no place in elementary statistical theory. Thus 
the author devotes considerable attention to auto-correlation and power spectra, to fluctuations in 
electrical circuits, and to the use of indices of ‘information’ in communication engineering. 

Unfortunately, the fact that concern with specialized theory is more than usually justifiable does 
not eliminate the difficulty of combining specialization with a clear development ab initio of funda- 
mental statistical ideas. The small size of the book does not ease the task, but it does seem that good 
use has not been made of the available space. A general outline of the path followed by the author will 
elucidate this criticism. He starts with a discussion of probability theories, in the course of which 
the frequency theory is castigated for depending on the ‘undefined concept’ of ‘equally probable’. 
There follows (Chapter 11) a discussion of qualitative classification and, in particular, of contingency 
tables. The concept of significance is first mentioned in the following sentence: ‘The idea of ‘‘degrees 
of freedom” is fundamental to the assessing of the ‘‘significance”’ of a variation, i.e. deciding whether 
an inference drawn from the statistics is likely to be valid or whether the observed variation is likely 
to be a random error of sampling.’ An all too common beginner’s error crops up on p. 22: ‘...if the 
observed value of y?, has, say, a 5% probability on the basis of a chance distribution there is a 95% 
probability that it is due to some association between the different attributes.’ 

Chapter 111 passes with bewildering brevity from observed distributions to binomial, Gaussian, 
Poisson, Rayleigh, Maxwell-Boltzmann and Fermi-Dirac distributions. Chapter Iv proceeds at an 
equally breath-taking speed coveriu.g moments, measures of dispersion, etc., and characteristic 
functions. In neither of these chapters is the distinction between sample and population made clear. 
Chapter v (Curve Fitting) discusses both the fitting of regression lines and distribution functions. 
Chapter vu (The Reliability of Data) is the only other chapter openly devoted to basic theory, though 
Chapter vu (Principles of Quality Control) is in effect a belated discussion of elementary ideas. 

It would be bad enough were the reader to be faced with the task of unscrambling correct ideas from 
confused presentation. There is, unfortunately, evidence that the author’s own appreciation of 
statistical theory is at fault. One example has already been quoted. We further find that on p. 104 
a t-statistic is referred to a t-distribution with 4 degrees of freedom because the arithmetic mean in the 
numerator is based on five observations, even though the standard deviation in the denominator is 
based on sixty observations. Confusion becomes worse confounded by a discussion which appears to 
mix up one-tail and two-tail tests. (Also the formula at the foot of p. 105 is in error by omission of 
(27)-*.) For a final example we look at pp. 101—2, where we find that the standard deviation of a mean 
of N (independent) observations is 1/,/(N —1) times the standard deviation of each observation. 

In view of features such as these the book cannot be recommended to engineers or others desiring 
a sound basic training in statistical theory. It may be of some value as a compendium of probabilistic 
techniques of interest to electrical engineers. N. L. JOHNSON 
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Design for Decision. By Irwin D. J. Bross. New York: The Macmillan Company. 
1953. Pp. viii+ 276. 30s. 


This book provides enjoyable reading and is an honest attempt to present current statistical ideas to 
a wide public. The author has taken considerable trouble to produce an interesting, popular text 
without hiding difficulties inherent in his subject. It is to be hoped that administrators, and scientists 
in general, will read this book with serious attention. Should this come to pass the author will have 
performed a valuable service to statistics in making its present-day bases, methods and capabilities 
known in much wider circles than is now the case. 

Decision theory, as is well-known in statistical circles, is the name given to the system developed by 
Wald in which he gave more formal emphasis to prior distributions and loss functions than did the 
Neyman-Pearson theory. Information on prior distributions and loss functions is often sparse and 
this had discouraged development of theory and still hampers practical applications. The author 
recognizes this difficulty and devotes considerable space to discussion of ‘Probability Systems’ and 
‘Value Systems’, i.e. the establishment of prior distributions and loss functions respectively. This 
discussion is followed by an interesting chapter on ‘Rules of Action’. Further chapters are entitled 
‘Operating a Decision-Maker’, ‘Sequential Decision’, ‘Data’, ‘Models’ and ‘Sampling’. The book 
might be improved if, with some rearrangement of material, it stopped at this point, except for the 
excellent final chapter on general aspects of decision problems. The remaining chapters tend to spoil 
the general picture by over-elaboration. ‘Measurement’ can be bettered by many eiementary 
statistical texts; ‘Statistical Inference’ is an inadequate account of the classical theory of tests and 
estimation implying an artificial distinction between Inference-Makers and Decision-Makers; while 
the penultimate chapter on ‘Statistical Techniques’ is competent but dull and necessarily very 
compressed. 

Although the book is written for popular consumption, professional statisticians should find it of 
very real valfe. It should stimulate renewed, and serious, thought about the foundations of their 
subject, and about their responsibilities in fashioning an instrument which may possibly affect deeply 
the mental outlook of their fellows in years to come. It is also an excellent antidote to those overdoses 
of mathematical trivia which we can at present hardly avoid. N. L. JOHNSON 


Statistical Theory in Research. By R. L. ANDERSON and T. A. Bancrorr. New York: 
McGraw-Hill Book Co. Ltd. 1952. Pp. xviii+ 389. 59s. 6d. 


Among the many statistical text-books now pouring from the printing presses of many countries this 
book written by R. L. Anderson and T. A. Bancroft is outstanding. The authors divide it into two parts; 
the first which they entitle ‘Basic Statistical Theory’ is just what it says. It is a competent exposition 
of the corpus of statistical theory which we might now term orthodox and consists of material which 
has been expounded and mis-expounded by many authors. 

The second part entitled ‘Analysis of Experimental Models by Least Squares’ is interesting, useful, 
informative and stimulating. It contains chapters on regression analysis, regression with variates, 
curvilinear regression, complete and incomplete blocks, factorial experiments, analysis of variance 
components, and discussion of the analysis of data when the model is a mixture of random and 
systematic effects. Various computational methods including the inversion of matrices are included. 

The treatment throughout is unified and mathematical clarity is achieved. What is somewhat 
surprisingly linked with theoretical clearness is adequate arithmetical exposition of problems allied to 
theory. The book is to be recommended to all students following a course of analysis of variance and 
experimental design and to those working statisticians who carry out arithmetical operations every 
day and who would like to know more of the theoretical background of their operations. 

¥. N. DAVID 


Advanced Statistical Methods in Biometric Research. By C. R. Rao. New York: 
John Wiley and Sons Inc.; London: Chapman & Hall. 1952. Pp. xvii+383. 60s. 


This book, one of a useful series published by John Wiley, is refreshing to read after the number of 
books with more or less stereotyped outlook which have been published during the years since the 
war. Dr Rao has confined himself for illustrative purposes to the field of biometry and in particular 
to the field of anthropometry, but he has actually written a text-book of statistics which is interesting 
and to a certain extent novel in exposition. 
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The first thirty pages cover the development of the matrix algebra, an understarding of which is 
necessary for the development of multivariate analysis. There follow chapters covering distribution 
theory, the testing of hypotheses, the theory of estimation and the method of maximum likelihood, 
all of which are familiar, and which have been covered by other authors. It is only when some two- 
thirds through the book that Dr Rao begins the development of the theory to which the Indian school 
of statisticians has made such striking contributions. Most research workers nowadays have at least 
a nodding acquaintance with the problems arising from multivariate analysis, Mahalanobis’s D? 
criterion and the discriminant functions of R. A. Fisher. Instead of having to search statistical 
literature for the exposition of these various statistical techniques, they will now find them gathered 
together and given coherence in the last three chapters of Dr Rao’s book. Probably all workers in the 
biometric field can read these chapters with profit, even if many will ultimately shirk the arithmetic 
and turn to Penrose’s work. 

Having said, or implied, that this is a useful book the reviewer would make one pointed criticism. 
The theme of the book is an extract by R. A. Fisher: ‘The value of the result (to the statistician) 
follows solely from the value of the material given him. It contains so much information and no more. 
His job is only to produce what it contains.’ On first reading this one thinks how true. But, and this 
is important, the statistician automatically puts more into the material than it contains every time he 
makes the basic assumption of normality. Sometimes this assumption does not matter; the reviewer 
has the suspicion that in multivariate tests it may on occasion matter very much, and certainly 
insufficient emphasis is laid on this possibly important point. 

The book is reasonably well provided with references to original literature with the possible excep- 
tion of researches carried out by students of Neyman and Pearson. For example, Kolodzieczyk, who 
did not survive the last war and who made an important contribution to the theory of testing linear 
hypotheses, merits more than an end-of-chapter reference. Readers will be able to supply other 
omissions for themselves. F. N. DAVID 


Sampling Methods for Censuses and Surveys, 2nd editicn. By F. Yatses. London: 
Charles Griffin and Co. Ltd. 1953. Pp. xvi+401. 38s. 


The first edition of this book immediately became the standard manual for survey practitioners. Its 
practical detail and many worked examples have also made it very useful to teachers, although it 
makes no attempt to develop the theory of the subject, and therefore requires supplementation by 
a text such as that recently published by Prof. Cochran. 

This new edition leaves unchanged the earlier text, but adds two new chapters. Chapter 9 is devoted 
to methods of critical analysis of surveys of an investigational character, special attention being paid 
to the treatment of contrasts between domains of study, which may cut across the original stratification 
of the population. This important problem is commonly encountered, and there will be a general 
welcome for its inclusion among the new material. 

Chapter 10 is a collection of miscellaneous developments, largely devoted to bringing up to date the 
exposition in the first edition. The fact that the index has been extended to cover the new material 
means that the fragmentary nature of Chapter 10 will not prevent its being of use even to those not 
familiar with the main body of the book. 

The select bibliography which was a feature of the first edition has been supplemented, and, for the 
benefit of those new to the subject, a short course of reading in the book has been indicated. 

It is a tribute to the quality of the first edition that no competitor has appeared; despite the jump 
in price from 24s. to 38s., none is likely to rival this second edition. ALAN STUART 


Stochastic Processes. By J. L. Doon. New York: John Wiley and Sons Inc.; London: 
Chapman and Hall. 1953. Pp. vi+65. 80s. 


Since the birth of modern probability occurred with the fundamental paper by Kolmogoroff, the 
development of the mathematics of probability has been almost exclusively the work of the French 
and Russian schools, or of scholars who have received their basic training in these schools. The principal 
exception to this French-Russian monopoly is J. L. Doob, and probabilists everywhere will read his 
book with interest. Prof. Doob takes as his foundation the axiomatic definition of Kolmogoroff and 
probability to him is a branch of the theory of Lebesgue measure. Starting from here, what Prof. Doob 
has really written is a text-book covering the mathematical development of probability theory, the 
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emphasis being on stochastic processes because it is here that such rapid strides have been and are 
being made. It is in this field, incidentally, that Prof. Doob himself has made such distinguished 
contributions. 

The book will be of little use to anyone who has not been trained in modern methods of mathematical 
analysis; it will be no use at all to the non-mathematician seeking for applications. For the mathe- 
matical statistician seeking to make rigorous his knowledge of probability theory and wishing to learn 
up to the boundary of present day research it will be indispensable. It is definitely a specialist’s book, 
which is obviously what the author intended, and within the limited field it will be without rival for 
years to come. Statisticians tend to use loose mathematical arguments—chiefly because it is not 
possible to solve their problems any other way-——but it is well that there are probabilists such as Prof. 
Doob stressing the formal mathematical arguments in what is, after all, the root and branch of our work. 

The price is perhaps an inhibiting factor from the British point of view. Nevertheless new and old 
workers alike in the probability field will find it a necessity for their personal libraries. ¥.N. DAVID 


Cambridge Elementary Statistical Tables. By D. V. Liyptry and J. C. P. Mur. 
London: Cambridge University Press. 1953. Pp. 36. 5s. 


The page size of these tables is 11 x 8} in. The cover is of strong paper. Priced at 5s., they will be 
considered to be good value by many occasional users of statistical methods and by students taking 
elementary courses in statistics at school and at the university. 

About half of the space is allotted to a table of values of x, /x, ./(10x), 2-1, 2-4, (10z)-# and log x 
for x= 1-00 by intervals of 0-01 to = 10-00. In general, four-figure accuracy is given and first 
differences are printed. There is also an antilogarithm table and a table of log n! up to n = 300. 

Most of the remaining space is devoted to the common statistical distribution functions. Areas under 
the normal curve. are given in general to four figures against values of the argument proceeding by 
intervals of 4-01. There are also very brief tables of normal ordinates and percentage points. Then 
there are tables of percentage points of ¢, y* and Ff. Other short miscellaneous tables complete the 
booklet. 

Collections like the present one naturally invite comparison with larger collections such as those of 
Fisher and Yates and of Hald, and with the Biometrika Tables for Statisticians, recently revised and 
edited by Pearson and Hartley. These larger volumes differ either by providing more figures in the 
basic tables, or by providing many specialized tabies designed for specific statistical problems, or by 
including much explanatory matter to illustrate the statistical applications. There is little point, 
however, in attempting to criticize the present Cambridge publication on the ground that it does not 
contain much useful matter that is found in these larger collections. Granted the size and the price 
the authors have in the opinion of the reviewer made an excellent choice of what should be included. 
The printing is clear and the booklet easy to handle. It should have a large circulation among students 
and among those practising scientists whose work requires some acquaintance with statistical methods. 

B. L. WELCH 


PUBLICATIONS OF U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 


(i) Tables of normal probability functions. Applied Mathematics Series 23. 1953. 
Pp. 344. Price $2.75. 


These tables were originally issued in 1942 as no. 14 in the Mathematical Tables Project; the present 
edition incorporates certain corrections to errors discovered in the arguments which had been already 
included in the 1948 edition. 

The main Table 1 contains values computed to 15 decimal places of the two functions 


1 1 fz " 
Q(x) = —— e-#"”, P(x =r) e-ta’ da 
©) = Jan) ©) = Jam) 2 
for x = 0-0000 (0-0001) 1-0000 (0-001) 7-800. For larger values of x, the functions converge rapidly to 


0 and 1, respectively, and their values to 15 decimals may be read by inspection from the last page of 
the table. 











286 Reviews 





A supplementary table (Table II), covering four pages, gives values of Q(x) and of 1— P(x) to seven 
significant figures in the range 


x = 6-00 (0-01) 10-00. 


A short Introduction by A. N. Lowan describes methods of direct and inverse interpolation and ends 
with a Bibliography of some of the more important existing tables of normal probability functions. 


(ii) Table of natural logarithms for arguments between zero and five to sixteen 
decimal places. Applied Mathematics Series 31. 1953. Pp. 601. Price $3.25. 


This table is a reissue of vol. 3 of the Mathematical Tables Project, Table 10, first published in 1941. 
It gives to 16 decimal places values of log, x for x = 0-0000 (0-0001) 5-0000. No revisions in the original 
tabular content have been found necessary. There is a short Introduction by A. N. Lowan. 

From the prefatory remarks it is understood that a further volume will complete the reissue, giving 
values of log, « for x in the range 5—10. E. 8S. PEARSON 


50-100 Binomial Tables. By Harry G. Romie. New York: Wiley; London: Chapman 
and Hall. 1953. Pp. xxvii+172. 32s. 


These tables were prepared while the author was associated with the Bell Telephone Laboratories and 
issued in 1947 in a preliminary draft. They give to six decimal places: 

(a) the individual terms "C,p*(1—p)"-*; 

(6) the cumulative sum of terms (probability of 2 or less) of the binomial expansion (q+ p)", where 
q = 1—>, for the range of arguments 


n = 50(5)100, p = 0-01(0-01) 0-50. 


The tables break new ground in the sense that fresh computation was required, whereas the National 
Bureau of Standards Tables (Applied Mathematics Series 6, 1950) which covered the range n = 2(1) 49, 
were derivable from Karl Pearson’s Tables of the Incomplete Beta-function. 

The author gives a helpful Introduction indicating, among other properties, the relation between 
the sums of binomial terms and the Incomplete Beta-function; this relation makes it possible to obtain 
from the present tables values of the latter function falling well beyond the range of Karl Pearson’s 
tables. 

A companion volume is promised, covering in detail various methods of interpolation that may be 
needed to make best use of the material provided. A summary is given of the most commonly needed 
interpolation formulae. Since the argument interval for n is 5, exact formulae for obtaining individual 
or cumulative terms of binomials with argument n + 1, n + 2 in terms of those for n are of course needed 
and, in fact, take a simple form. E. S. PEARSON 
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